Context:

Typically e-commerce datasets are proprietary and consequently hard to find among publicly available data. This is a transactional data set that contains all the transactions occurring in online retail. Ecommerce data is information relating to the visitors and performance of an online shop. It's mostly used by marketers e.g. in understanding consumer behavior and enhancing conversion funnels.

Objective:

The objective is to find out the features which have the most information context to differentiate the positive class and negative class and also build a model to predict whether a customer will buy a product or not.

Dataset

The data contains information on web sessions of a customer:


  • "Administrative", "Administrative Duration", "Informational", "Informational Duration", "Product Related" and "Product Related Duration": These represent the number of different types of pages visited by the visitor in that session and total time spent in each of these page categories.
  • The values of these features are derived from the URL information of the pages visited by the user and updated in real-time when a user takes an action, e.g. moving from one page to another.

  • The "Bounce Rate", "Exit Rate" and "Page Value" features represent the metrics measured by "Google Analytics" for each page in the e-commerce site.
  • Bounce Rate: The value of "Bounce Rate" feature for a web page refers to the percentage of visitors who enter the site from that page and then leave ("bounce") without triggering any other requests to the analytics server during that session.

  • Exit Rate: The value of "Exit Rate" feature for a specific web page is calculated as for all pageviews to the page, the percentage that was the last in the session.

  • Dataset has average bounce rates and exit rates for a page customer landed on.

Read more about Bounce Rate vs Exit Rate here

  • Page Value: The "Page Value" feature represents the average value for a web page that a user visited before completing an e-commerce transaction.

Read more about Page Value here.


  • Special Day: The "Special Day" feature indicates the closeness of the site visiting time to a specific special day (e.g. Mother’s Day, Valentine's Day) in which the sessions are more likely to be finalized with the transaction.
  • The value of this attribute is determined by considering the dynamics of e-commerce such as the duration between the order date and delivery date.
  • For example, for Valentina’s day, this value takes a nonzero value between February 2 and February 12, zero before and after this date unless it is close to another special day, and its maximum value of 1 on February 8.

  • The dataset also includes the operating system, browser, region, traffic type - these values are masked.

  • VisitorType: returning visitor, new visitor, or other types of customer.

  • weekend: a Boolean value indicating whether the date of the visit is weekend or not
  • month: month of the year

Import the necessary packages

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
warnings.filterwarnings('ignore')

Read the dataset

In [2]:
shoppers = pd.read_csv('online_shoppers_intention.csv')
In [3]:
# copying data to another varaible to avoid any changes to original data
data=shoppers.copy()

View the first and last 10 rows of the dataset.

In [4]:
data.head(10)
Out[4]:
Administrative Administrative_Duration Informational Informational_Duration ProductRelated ProductRelated_Duration BounceRates ExitRates PageValues SpecialDay Month OperatingSystems Browser Region TrafficType VisitorType Weekend Revenue
0 0 0.0 0 0.0 1 0.000000 0.200000 0.200000 0.0 0.0 Feb 1 1 1 1 Returning_Visitor False False
1 0 0.0 0 0.0 2 64.000000 0.000000 0.100000 0.0 0.0 Feb 2 2 1 2 Returning_Visitor False False
2 0 0.0 0 0.0 1 0.000000 0.200000 0.200000 0.0 0.0 Feb 4 1 9 3 Returning_Visitor False False
3 0 0.0 0 0.0 2 2.666667 0.050000 0.140000 0.0 0.0 Feb 3 2 2 4 Returning_Visitor False False
4 0 0.0 0 0.0 10 627.500000 0.020000 0.050000 0.0 0.0 Feb 3 3 1 4 Returning_Visitor True False
5 0 0.0 0 0.0 19 154.216667 0.015789 0.024561 0.0 0.0 Feb 2 2 1 3 Returning_Visitor False False
6 0 0.0 0 0.0 1 0.000000 0.200000 0.200000 0.0 0.4 Feb 2 4 3 3 Returning_Visitor False False
7 1 0.0 0 0.0 0 0.000000 0.200000 0.200000 0.0 0.0 Feb 1 2 1 5 Returning_Visitor True False
8 0 0.0 0 0.0 2 37.000000 0.000000 0.100000 0.0 0.8 Feb 2 2 2 3 Returning_Visitor False False
9 0 0.0 0 0.0 3 738.000000 0.000000 0.022222 0.0 0.4 Feb 2 4 1 2 Returning_Visitor False False
In [5]:
data.tail(10)
Out[5]:
Administrative Administrative_Duration Informational Informational_Duration ProductRelated ProductRelated_Duration BounceRates ExitRates PageValues SpecialDay Month OperatingSystems Browser Region TrafficType VisitorType Weekend Revenue
12320 0 0.00 0 0.0 8 143.583333 0.014286 0.050000 0.000000 0.0 Nov 2 2 3 1 Returning_Visitor False False
12321 0 0.00 0 0.0 6 0.000000 0.200000 0.200000 0.000000 0.0 Nov 1 8 4 1 Returning_Visitor False False
12322 6 76.25 0 0.0 22 1075.250000 0.000000 0.004167 0.000000 0.0 Dec 2 2 4 2 Returning_Visitor False False
12323 2 64.75 0 0.0 44 1157.976190 0.000000 0.013953 0.000000 0.0 Nov 2 2 1 10 Returning_Visitor False False
12324 0 0.00 1 0.0 16 503.000000 0.000000 0.037647 0.000000 0.0 Nov 2 2 1 1 Returning_Visitor False False
12325 3 145.00 0 0.0 53 1783.791667 0.007143 0.029031 12.241717 0.0 Dec 4 6 1 1 Returning_Visitor True False
12326 0 0.00 0 0.0 5 465.750000 0.000000 0.021333 0.000000 0.0 Nov 3 2 1 8 Returning_Visitor True False
12327 0 0.00 0 0.0 6 184.250000 0.083333 0.086667 0.000000 0.0 Nov 3 2 1 13 Returning_Visitor True False
12328 4 75.00 0 0.0 15 346.000000 0.000000 0.021053 0.000000 0.0 Nov 2 2 3 11 Returning_Visitor False False
12329 0 0.00 0 0.0 3 21.250000 0.000000 0.066667 0.000000 0.0 Nov 3 2 1 2 New_Visitor True False

Understand the shape of the dataset.

In [6]:
data.shape
Out[6]:
(12330, 18)
  • Dataset has 12330 rows and 18 columns

Let's check the duplicate data. And if any, we should remove it.

In [7]:
data[data.duplicated()].count()
Out[7]:
Administrative             125
Administrative_Duration    125
Informational              125
Informational_Duration     125
ProductRelated             125
ProductRelated_Duration    125
BounceRates                125
ExitRates                  125
PageValues                 125
SpecialDay                 125
Month                      125
OperatingSystems           125
Browser                    125
Region                     125
TrafficType                125
VisitorType                125
Weekend                    125
Revenue                    125
dtype: int64

Let's drop the duplicate values. As we can see from above that 125 are duplicates.

In [8]:
data.drop_duplicates(inplace=True)

Check the data types of the columns for the dataset.

In [9]:
data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12205 entries, 0 to 12329
Data columns (total 18 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Administrative           12205 non-null  int64  
 1   Administrative_Duration  12205 non-null  float64
 2   Informational            12205 non-null  int64  
 3   Informational_Duration   12205 non-null  float64
 4   ProductRelated           12205 non-null  int64  
 5   ProductRelated_Duration  12205 non-null  float64
 6   BounceRates              12205 non-null  float64
 7   ExitRates                12205 non-null  float64
 8   PageValues               12205 non-null  float64
 9   SpecialDay               12205 non-null  float64
 10  Month                    12205 non-null  object 
 11  OperatingSystems         12205 non-null  int64  
 12  Browser                  12205 non-null  int64  
 13  Region                   12205 non-null  int64  
 14  TrafficType              12205 non-null  int64  
 15  VisitorType              12205 non-null  object 
 16  Weekend                  12205 non-null  bool   
 17  Revenue                  12205 non-null  bool   
dtypes: bool(2), float64(7), int64(7), object(2)
memory usage: 1.6+ MB

Insights:

  • Most of the data-types are either int64 or float64.
  • 2 columns - Month and VisitorType are having data-types as an object, this means we need to convert these into suitable data-type before we feed our data into the model.
  • The last two columns: "Weekend" and "Revenue", have the data-type as a bool.

Think about it:

  • We already know that we convert the categorical data-types into suitable form using either the "LabelEncoder" or "OneHotEncoding".
  • But here we have a bool data type. What should we do here?
  • In Python, True and False are cast implicitly into integers:

    True == 1

    False == 0

    This means that an algorithm running in pure Python should work without conversion.

Fixing the data types

  • Month and VisitorType are of an object type, we can change them to categories.

converting "objects" to "category" reduces the data space required to store the dataframe

In [10]:
data["Month"] = data["Month"].astype("category")
data["VisitorType"] = data["VisitorType"].astype("category")
In [11]:
data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12205 entries, 0 to 12329
Data columns (total 18 columns):
 #   Column                   Non-Null Count  Dtype   
---  ------                   --------------  -----   
 0   Administrative           12205 non-null  int64   
 1   Administrative_Duration  12205 non-null  float64 
 2   Informational            12205 non-null  int64   
 3   Informational_Duration   12205 non-null  float64 
 4   ProductRelated           12205 non-null  int64   
 5   ProductRelated_Duration  12205 non-null  float64 
 6   BounceRates              12205 non-null  float64 
 7   ExitRates                12205 non-null  float64 
 8   PageValues               12205 non-null  float64 
 9   SpecialDay               12205 non-null  float64 
 10  Month                    12205 non-null  category
 11  OperatingSystems         12205 non-null  int64   
 12  Browser                  12205 non-null  int64   
 13  Region                   12205 non-null  int64   
 14  TrafficType              12205 non-null  int64   
 15  VisitorType              12205 non-null  category
 16  Weekend                  12205 non-null  bool    
 17  Revenue                  12205 non-null  bool    
dtypes: bool(2), category(2), float64(7), int64(7)
memory usage: 1.4 MB
  • Month and VisitorType have been converted to categories.

we can see that the memory usage has decreased from 1.6+ MB to 1.4 MB

Check for missing values

In [12]:
data.isnull().sum()
Out[12]:
Administrative             0
Administrative_Duration    0
Informational              0
Informational_Duration     0
ProductRelated             0
ProductRelated_Duration    0
BounceRates                0
ExitRates                  0
PageValues                 0
SpecialDay                 0
Month                      0
OperatingSystems           0
Browser                    0
Region                     0
TrafficType                0
VisitorType                0
Weekend                    0
Revenue                    0
dtype: int64
  • There are no missing values in the data.

Give a statistical summary for the dataset.

In [13]:
data.describe().T
Out[13]:
count mean std min 25% 50% 75% max
Administrative 12205.0 2.338878 3.330436 0.0 0.000000 1.000000 4.000000 27.000000
Administrative_Duration 12205.0 81.646331 177.491845 0.0 0.000000 9.000000 94.700000 3398.750000
Informational 12205.0 0.508726 1.275617 0.0 0.000000 0.000000 0.000000 24.000000
Informational_Duration 12205.0 34.825454 141.424807 0.0 0.000000 0.000000 0.000000 2549.375000
ProductRelated 12205.0 32.045637 44.593649 0.0 8.000000 18.000000 38.000000 705.000000
ProductRelated_Duration 12205.0 1206.982457 1919.601400 0.0 193.000000 608.942857 1477.154762 63973.522230
BounceRates 12205.0 0.020370 0.045255 0.0 0.000000 0.002899 0.016667 0.200000
ExitRates 12205.0 0.041466 0.046163 0.0 0.014231 0.025000 0.048529 0.200000
PageValues 12205.0 5.949574 18.653671 0.0 0.000000 0.000000 0.000000 361.763742
SpecialDay 12205.0 0.061942 0.199666 0.0 0.000000 0.000000 0.000000 1.000000
OperatingSystems 12205.0 2.124211 0.906823 1.0 2.000000 2.000000 3.000000 8.000000
Browser 12205.0 2.357804 1.710114 1.0 2.000000 2.000000 2.000000 13.000000
Region 12205.0 3.153298 2.402340 1.0 1.000000 3.000000 4.000000 9.000000
TrafficType 12205.0 4.073904 4.016654 1.0 2.000000 2.000000 4.000000 20.000000
  • All the numerical variables seem to be right-skewed.
  • Most time spent by customers is on the ProductRelated pages.
  • There are some customers who haven't visited the administrative page and informational pages.
In [14]:
data.describe(include=['category','bool'])
Out[14]:
Month VisitorType Weekend Revenue
count 12205 12205 12205 12205
unique 10 3 2 2
top May Returning_Visitor False False
freq 3329 10431 9346 10297
In [15]:
data['Month'].unique()
Out[15]:
[Feb, Mar, May, Oct, June, Jul, Aug, Nov, Sep, Dec]
Categories (10, object): [Feb, Mar, May, Oct, ..., Aug, Nov, Sep, Dec]
In [16]:
data['VisitorType'].unique()
Out[16]:
[Returning_Visitor, New_Visitor, Other]
Categories (3, object): [Returning_Visitor, New_Visitor, Other]
In [17]:
data['Weekend'].unique()
Out[17]:
array([False,  True])
In [18]:
data['Revenue'].unique()
Out[18]:
array([False,  True])
  • Data is of 10 months January and April's data is not available with us.
  • In may the website had the most active customers.
  • Most of the customers are returning type customers which is a good thing for the business.
  • Most traffic on the website is generally on the weekdays.
  • Website is only able to generate revenue from a small portion of customers.

EDA

Univariate analysis

In [19]:
# While doing uni-variate analysis of numerical variables we want to study their central tendency 
# and dispersion.
# Let us write a function that will help us create boxplot and histogram for any input numerical 
# variable.
# This function takes the numerical column as the input and returns the boxplots 
# and histograms for the variable.
# Let us see if this help us write faster and cleaner code.
def histogram_boxplot(feature, figsize=(15,10), bins = None):
    """ Boxplot and histogram combined
    feature: 1-d feature array
    figsize: size of fig (default (9,8))
    bins: number of bins (default None / auto)
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                           sharex = True, # x-axis will be shared among all subplots
                                           gridspec_kw = {"height_ratios": (.25, .75)}, 
                                           figsize = figsize 
                                           ) # creating the 2 subplots
    sns.boxplot(feature, ax=ax_box2, showmeans=True, color='violet') # boxplot will be created and a star will indicate the mean value of the column
    sns.distplot(feature, kde=F, ax=ax_hist2, bins=bins,palette="winter") if bins else sns.distplot(feature, kde=False, ax=ax_hist2) # For histogram
    ax_hist2.axvline(np.mean(feature), color='green', linestyle='--') # Add mean to the histogram
    ax_hist2.axvline(np.median(feature), color='black', linestyle='-') # Add median to the histogram

Observations on Administrative_Duration

In [20]:
histogram_boxplot(data["Administrative_Duration"])
  • The distribution of Administrative_Duration is right-skewed.
  • There are outliers in this variable.
  • From the boxplot we can see that the third quartile(Q3) is equal to 94 which means 75% of customers stay less than 94 seconds on Administrative pages in a session.

Observations on Informational

In [21]:
histogram_boxplot(data["Informational"])
  • The distribution of Informational is right-skewed.
  • There are outliers in this variable.
  • Very few customers have visited the informational page.

Observations on Informational_Duration

In [22]:
histogram_boxplot(data["Informational_Duration"])
  • The distribution of Informational_Duration is right-skewed.
  • There are outliers in this variable.
  • On average customers have spent 35 seconds on the informational page.

Observations on ProductRelated

In [23]:
histogram_boxplot(data["ProductRelated"])
  • The distribution of ProductRelated is right-skewed.
  • There are outliers in this variable.
  • From the boxplot we can see that the third quartile(Q3) is equal to 38 which means 75% of customers have visited less than 38 pages and on average a customer visits 32 pages.

Observations on ProductRelated_Duration

In [24]:
histogram_boxplot(data["ProductRelated_Duration"])
  • The distribution of ProductRelated_Duration is right-skewed.
  • There are outliers in this variable.
  • On average customers have spent 1206 seconds (~20 minutes) on the ProductRelated page which is way more than the administrative page and informational page.

Observations on BounceRates

In [25]:
histogram_boxplot(data["BounceRates"])
  • The distribution of BounceRates is right-skewed.
  • There are outliers in this variable.
  • On average the bounce rate of a webpage is 0.022.

Observations on ExitRates

In [26]:
histogram_boxplot(data["ExitRates"])
  • The distribution of ExitRates is right-skewed.
  • There are outliers in this variable.
  • On average the bounce rate of a webpage is 0.041.

Observations on PageValues

In [27]:
histogram_boxplot(data["PageValues"])
  • The distribution of PageValues is right-skewed.
  • There are outliers in this variable.
  • On average the customers visit 6 pages before finally landing on the transaction page.

Observations on SpecialDay

In [28]:
histogram_boxplot(data["SpecialDay"])
  • The distribution of SpecialDays suggests that most customers have visited the website on days which were not close to a special occasion.
  • The distribution of SpecialDays also suggests we should look at it as a category to extract more information.

Observations on OperatingSystems

In [29]:
histogram_boxplot(data["OperatingSystems"])
  • The distribution of Operating systems shows most of the customers accessed the website using 2nd operating system.
  • These values are masked so it is difficult to comment on it.
  • The distribution of Operating systems also suggests we should look at it as a category to extract more information.

Observations on Browser

In [30]:
histogram_boxplot(data["Browser"])
  • The distribution of Browser shows most of the customers accessed the website using 2nd browser.
  • These values are masked so it is difficult to comment on it.
  • The distribution of Browser also suggests we should look at it as a category to extract more information.

Observations on Region

In [31]:
histogram_boxplot(data["Region"])
  • The distribution of Region shows most of the customers accessed the website from Region 1.
  • These values are masked so it is difficult to comment on it.
  • The distribution of Region also suggests we should look at it as a category to extract more information.

Observations on TrafficType

In [32]:
histogram_boxplot(data["TrafficType"])
  • The distribution of TrafficTyoes shows most of types of traffic the website gets is '2.
  • These values are masked so it is difficult to comment on it.
  • The distribution of Traffic also suggests we should look at it as a category to extract more information.
In [33]:
# Function to create barplots that indicate percentage for each category.

def perc_on_bar(plot, feature):
    '''
    plot
    feature: categorical feature
    the function won't work if a column is passed in hue parameter
    '''
    total = len(feature) # length of the column
    for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_height()/total) # percentage of each class of the category
        x = p.get_x() + p.get_width() / 2 - 0.05 # width of the plot
        y = p.get_y() + p.get_height()           # hieght of the plot
        ax.annotate(percentage, (x, y), size = 12) # annotate the percantage 
    plt.show() # show the plot

Observations on Month

In [34]:
plt.figure(figsize=(15,5))
ax = sns.countplot(data["Month"],palette='winter',order=['Feb','Mar','May','June','Jul','Aug','Sep','Oct','Nov','Dec'])
perc_on_bar(ax,data["Month"])
  • 27.3% of the users visited the website in the month of May followed by November with 24.4% of traffic.

Observations on VisitorType

In [35]:
plt.figure(figsize=(15,5))
ax = sns.countplot(data["VisitorType"],palette='winter')
perc_on_bar(ax,data["VisitorType"])
  • 85.5% of the customer are returning customers, indicating the website has a loyal base of customers.
  • During these 10 months website also say 13.9% of the customers visiting the website.

Observations on SpecialDay

In [36]:
plt.figure(figsize=(15,5))
ax = sns.countplot(data["SpecialDay"],palette='winter')
perc_on_bar(ax,data["SpecialDay"])
  • 89.9% of the website sessions are on Non-Special days.

Observations on OperatingSystems

In [37]:
plt.figure(figsize=(15,5))
ax = sns.countplot(data["OperatingSystems"],palette='winter')
perc_on_bar(ax,data["OperatingSystems"])
  • 53.6% of the customers use '2' operating system.

Observations on Region

In [38]:
plt.figure(figsize=(15,5))
ax = sns.countplot(data["Region"],palette='winter')
perc_on_bar(ax,data["Region"])
  • 38.6% of the website sessions are from customers of Region 1.

Observations on TrafficType

In [39]:
plt.figure(figsize=(15,7))
ax = sns.countplot(data["TrafficType"],palette='winter')
perc_on_bar(ax,data["TrafficType"])
  • 32% of the traffic on website is of type 2.

Observations on Weekend

In [40]:
plt.figure(figsize=(15,7))
ax = sns.countplot(data["Weekend"],palette='winter')
perc_on_bar(ax,data["Weekend"])
  • 76.6% of the website sessions are on Non-weekend days.

Observations on Revenue

In [41]:
plt.figure(figsize=(15,7))
ax = sns.countplot(data["Revenue"],palette='winter')
perc_on_bar(ax,data["Revenue"])
  • Website is able to generate revenue from only 15.6% of the customers.

Observations on Administrative

In [42]:
plt.figure(figsize=(15,7))
ax = sns.countplot(data["Administrative"],palette='winter')
perc_on_bar(ax,data["Administrative"])
  • 46.2% of the customers have not visited the administrative page.

Observations on Informational

In [43]:
plt.figure(figsize=(15,7))
ax = sns.countplot(data["Informational"],palette='winter')
perc_on_bar(ax,data["Informational"])
  • 78.4% of the customers have not visited the Informational page.

Observations on ProductRelated

In [44]:
plt.figure(figsize=(15,7))
ax = sns.countplot(data["ProductRelated"],palette='winter')
perc_on_bar(ax,data["ProductRelated"])
In [45]:
(len(data[data["ProductRelated"]==0]["ProductRelated"])/len(data))*100
Out[45]:
0.311347808275297
  • Only 0.3% of customers have not visited the product related pages.

Bivariate Analysis

In [46]:
plt.figure(figsize=(15,7))
sns.heatmap(data.corr(),annot=True)
plt.show()
  • Revenue shows the highest correlation with PageValues (0.49) simply because PageValues takes in account the pages visited before reaching the 'transaction' page.
  • Administrative, Informational and ProductRelated pages are correlated with the Administrative, Informational and ProductRelated time durations spent on them which is understandable.
  • BounceRates and ExitRates are very highly correlated(0.9) with each other.
In [47]:
sns.pairplot(data=data,hue="Revenue",)
plt.show()
  • We can see varying distributions in variables for revenue, we should investigate it further.

Revenue vs Administrative, Informational and ProductRelated pages and time spent on these pages

In [48]:
cols = data[['Administrative','Administrative_Duration','Informational','Informational_Duration','ProductRelated','ProductRelated_Duration']].columns.tolist()
plt.figure(figsize=(12,7))

for i, variable in enumerate(cols):
                     plt.subplot(3,2,i+1)
                     sns.boxplot(data["Revenue"],data[variable],palette="PuBu")
                     plt.tight_layout()
                     plt.title(variable)
plt.show()

It is difficult to make an interpretation from the graphs above let's visualize them by removing these outliers (for visualization not from orignal data) to get a better understanding

In [49]:
cols = data[['Administrative','Administrative_Duration','Informational','Informational_Duration','ProductRelated','ProductRelated_Duration']].columns.tolist()
plt.figure(figsize=(12,7))

for i, variable in enumerate(cols):
                     plt.subplot(3,2,i+1)
                     sns.boxplot(data["Revenue"],data[variable],palette="PuBu",showfliers=False)
                     plt.tight_layout()
                     plt.title(variable)
plt.show()
  • Customers who have visited Administrative, Informational and ProductRelated pages more times contribute to the revenue as compared to the one's who visited less pages.
  • Those customers who spend more time on the Administrative, Informational and ProductRelated pages help in generating the revenue.
  • But in both cases - visiting the page and spending more time on pages have many outliers.
  • There is clear distinction between the customers - visiting more pages and spening more time on pages contributes to the revenue.

Revenue vs Bounce Rates,Exit Rates

In [50]:
cols = data[['BounceRates','ExitRates']].columns.tolist()
plt.figure(figsize=(10,5))

for i, variable in enumerate(cols):
                     plt.subplot(1,2,i+1)
                     sns.boxplot(data["Revenue"],data[variable],palette="PuBu")
                     plt.tight_layout()
                     plt.title(variable)
plt.show()
  • It is logical that customers who leave the website withouth triggering any response on server(definition of bounce rate) wouldn't be contributing to the revenue, similarly for the exit rates.

Revenue vs PageValues

In [51]:
plt.figure(figsize=(10,5))
sns.boxplot(data['Revenue'],data['PageValues'])
plt.show()
  • Higher PageValues means higher contribution to revenue.

Revenue vs SpecialDay

In [52]:
def stacked_plot(x):
    sns.set(palette='nipy_spectral')
    tab1 = pd.crosstab(x,data['Revenue'],margins=True)
    print(tab1)
    print('-'*120)
    tab = pd.crosstab(x,data['Revenue'],normalize='index')
    tab.plot(kind='bar',stacked=True,figsize=(10,5))
    plt.legend(loc='lower left', frameon=False)
    plt.legend(loc="upper left", bbox_to_anchor=(1,1))
    plt.show()
In [53]:
stacked_plot(data['SpecialDay'])
Revenue     False  True    All
SpecialDay                    
0.0          9125  1831  10956
0.2           164    14    178
0.4           230    13    243
0.6           321    29    350
0.8           313    11    324
1.0           144    10    154
All         10297  1908  12205
------------------------------------------------------------------------------------------------------------------------
  • Regular days contribute more to the revenue instead of special occasions which is understandable as most days are non-special days, but there are very few conversion sessions on special days.

Revenue vs Month

In [54]:
stacked_plot(data["Month"])
Revenue  False  True    All
Month                      
Aug        357    76    433
Dec       1490   216   1706
Feb        178     3    181
Jul        366    66    432
June       256    29    285
Mar       1668   192   1860
May       2964   365   3329
Nov       2222   760   2982
Oct        434   115    549
Sep        362    86    448
All      10297  1908  12205
------------------------------------------------------------------------------------------------------------------------
  • November and May are the months with largest contributions in the revenue.
  • There are lesser sessions in November as compared to May but more conversions.
In [55]:
revenue_data = data[data['Revenue']==True]
revenue_data.groupby(['Month','SpecialDay'])['Revenue'].count()
Out[55]:
Month  SpecialDay
Aug    0.0            76
       0.2             0
       0.4             0
       0.6             0
       0.8             0
       1.0             0
Dec    0.0           216
       0.2             0
       0.4             0
       0.6             0
       0.8             0
       1.0             0
Feb    0.0             1
       0.2             0
       0.4             0
       0.6             0
       0.8             1
       1.0             1
Jul    0.0            66
       0.2             0
       0.4             0
       0.6             0
       0.8             0
       1.0             0
June   0.0            29
       0.2             0
       0.4             0
       0.6             0
       0.8             0
       1.0             0
Mar    0.0           192
       0.2             0
       0.4             0
       0.6             0
       0.8             0
       1.0             0
May    0.0           290
       0.2            14
       0.4            13
       0.6            29
       0.8            10
       1.0             9
Nov    0.0           760
       0.2             0
       0.4             0
       0.6             0
       0.8             0
       1.0             0
Oct    0.0           115
       0.2             0
       0.4             0
       0.6             0
       0.8             0
       1.0             0
Sep    0.0            86
       0.2             0
       0.4             0
       0.6             0
       0.8             0
       1.0             0
Name: Revenue, dtype: int64
  • Revenue sessions on or around Special day were only observed in May, this means website is not able to capatilize on the special occasions.

Revenue vs OperatingSystems

In [56]:
stacked_plot(data["OperatingSystems"])
Revenue           False  True    All
OperatingSystems                    
1                  2170   379   2549
2                  5386  1155   6541
3                  2262   268   2530
4                   393    85    478
5                     5     1      6
6                    17     2     19
7                     6     1      7
8                    58    17     75
All               10297  1908  12205
------------------------------------------------------------------------------------------------------------------------
  • Custoemrs using operating system '2' are the highest contributors to revenue, approximately 20% (1155).

Revenue vs Browser

In [57]:
stacked_plot(data["Browser"])
Revenue  False  True    All
Browser                    
1         2062   365   2427
2         6660  1223   7883
3          100     5    105
4          601   130    731
5          379    86    465
6          154    20    174
7           43     6     49
8          114    21    135
9            1     0      1
10         131    32    163
11           5     1      6
12           7     3     10
13          40    16     56
All      10297  1908  12205
------------------------------------------------------------------------------------------------------------------------
  • Approximately 20%(1223 customers) of cutomers using Browser '2' contribute to the revenue.

Revenue vs Region

In [58]:
stacked_plot(data["Region"])
Revenue  False  True    All
Region                     
1         3943   771   4714
2          940   188   1128
3         2030   349   2379
4          996   175   1171
5          266    52    318
6          689   112    801
7          639   119    758
8          375    56    431
9          419    86    505
All      10297  1908  12205
------------------------------------------------------------------------------------------------------------------------
  • There is a similar distrbution in regions.
In [59]:
stacked_plot(data["TrafficType"])
Revenue      False  True    All
TrafficType                    
1             2126   262   2388
2             3064   847   3911
3             1833   180   2013
4              901   165   1066
5              204    56    260
6              390    53    443
7               28    12     40
8              248    95    343
9               37     4     41
10             360    90    450
11             200    47    247
12               1     0      1
13             685    43    728
14              11     2     13
15              37     0     37
16               2     1      3
17               1     0      1
18              10     0     10
19              16     1     17
20             143    50    193
All          10297  1908  12205
------------------------------------------------------------------------------------------------------------------------
  • There is a varying pattern of revenue sessions among different traffic sources, although traffic coming source 2 has the highest proportion of revenue sessions.

Revenue vs VisitorType

In [60]:
stacked_plot(data["VisitorType"])
Revenue            False  True    All
VisitorType                          
New_Visitor         1271   422   1693
Other                 65    16     81
Returning_Visitor   8961  1470  10431
All                10297  1908  12205
------------------------------------------------------------------------------------------------------------------------
  • Returning customers are the largest contributors to the revenue. But interesting New visitors end up having more conversion sessions.

Revenue vs Weekend

In [61]:
stacked_plot(data["Weekend"])
Revenue  False  True    All
Weekend                    
False     7937  1409   9346
True      2360   499   2859
All      10297  1908  12205
------------------------------------------------------------------------------------------------------------------------
  • There is not much difference in weekend and week-day sessions.

Customer-Level Analysis

In [62]:
cols = data[['Administrative','Administrative_Duration','Informational','Informational_Duration','ProductRelated','ProductRelated_Duration']].columns.tolist()
plt.figure(figsize=(10,10))

for i, variable in enumerate(cols):
                     plt.subplot(4,2,i+1)
                     sns.boxplot(data["VisitorType"],data[variable],hue=data['Revenue'],palette="PuBu", showfliers=False) #turning-off outliers
                     plt.tight_layout()
                     plt.legend(loc="upper left", bbox_to_anchor=(1,1))
                     plt.title(variable)
plt.show()
  • New visitors who have visited more administrative pages and spent more time on them have not contributed to the revenue - This indicates that the administrative pages need to be user friendly and simplified.
  • Returning customers visiting informational pages have contributed to revenue - The informational pages are doing well to provide customers with all the required information.
  • ProductRelated pages are doing a fair job.
In [63]:
tab1 = pd.crosstab(data['Month'],data['VisitorType'],margins=True)
print(tab1)
print('-'*120)
tab = pd.crosstab(data['Month'],data['VisitorType'],normalize='index')
tab.plot(kind='bar',stacked=True,figsize=(10,5))
plt.legend(loc='lower left', frameon=False)
plt.legend(loc="upper left", bbox_to_anchor=(1,1))
plt.show()
VisitorType  New_Visitor  Other  Returning_Visitor    All
Month                                                    
Aug                   72      0                361    433
Dec                  334     58               1314   1706
Feb                    1      0                180    181
Jul                   54      0                378    432
June                  30      1                254    285
Mar                  232      0               1628   1860
May                  319      0               3010   3329
Nov                  419     22               2541   2982
Oct                  124      0                425    549
Sep                  108      0                340    448
All                 1693     81              10431  12205
------------------------------------------------------------------------------------------------------------------------
  • November and December saw most new customers, but the proportion of new customers to returning customers is highest in October and September.

Data Preperation

In [64]:
# page values have the information related to the transaction activity of a customer and would create a bias in model.
data = data.drop(['PageValues'],axis=1) 
In [65]:
dummy_data = pd.get_dummies(data, columns=['Month','VisitorType','Weekend','Region','Browser',
                                           'OperatingSystems','SpecialDay'],drop_first=True)
dummy_data.head()
Out[65]:
Administrative Administrative_Duration Informational Informational_Duration ProductRelated ProductRelated_Duration BounceRates ExitRates TrafficType Revenue ... OperatingSystems_4 OperatingSystems_5 OperatingSystems_6 OperatingSystems_7 OperatingSystems_8 SpecialDay_0.2 SpecialDay_0.4 SpecialDay_0.6 SpecialDay_0.8 SpecialDay_1.0
0 0 0.0 0 0.0 1 0.000000 0.20 0.20 1 False ... 0 0 0 0 0 0 0 0 0 0
1 0 0.0 0 0.0 2 64.000000 0.00 0.10 2 False ... 0 0 0 0 0 0 0 0 0 0
2 0 0.0 0 0.0 1 0.000000 0.20 0.20 3 False ... 1 0 0 0 0 0 0 0 0 0
3 0 0.0 0 0.0 2 2.666667 0.05 0.14 4 False ... 0 0 0 0 0 0 0 0 0 0
4 0 0.0 0 0.0 10 627.500000 0.02 0.05 4 False ... 0 0 0 0 0 0 0 0 0 0

5 rows × 54 columns

Model Building - Approach

  1. Data preparation
  2. Partition the data into train and test set.
  3. Built a CART model on the train data.
  4. Tune the model and prune the tree, if required.
  5. Test the data on test set.
In [66]:
column_names = list(dummy_data.columns)
column_names.remove('Revenue')                     # Keep only names of features by removing the name of target variable
feature_names = column_names
print(feature_names)
['Administrative', 'Administrative_Duration', 'Informational', 'Informational_Duration', 'ProductRelated', 'ProductRelated_Duration', 'BounceRates', 'ExitRates', 'TrafficType', 'Month_Dec', 'Month_Feb', 'Month_Jul', 'Month_June', 'Month_Mar', 'Month_May', 'Month_Nov', 'Month_Oct', 'Month_Sep', 'VisitorType_Other', 'VisitorType_Returning_Visitor', 'Weekend_True', 'Region_2', 'Region_3', 'Region_4', 'Region_5', 'Region_6', 'Region_7', 'Region_8', 'Region_9', 'Browser_2', 'Browser_3', 'Browser_4', 'Browser_5', 'Browser_6', 'Browser_7', 'Browser_8', 'Browser_9', 'Browser_10', 'Browser_11', 'Browser_12', 'Browser_13', 'OperatingSystems_2', 'OperatingSystems_3', 'OperatingSystems_4', 'OperatingSystems_5', 'OperatingSystems_6', 'OperatingSystems_7', 'OperatingSystems_8', 'SpecialDay_0.2', 'SpecialDay_0.4', 'SpecialDay_0.6', 'SpecialDay_0.8', 'SpecialDay_1.0']

Split Data

In [67]:
X = dummy_data.drop('Revenue',axis=1)                                                 # Features
y = dummy_data['Revenue'].astype('int64')                                             # Labels (Target Variable)
# converting target to integers - since some functions might not work with bool type
In [68]:
# Splitting data into training and test set:
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=0.3, random_state=1)
print(X_train.shape, X_test.shape)
(8543, 53) (3662, 53)

Build Decision Tree Model

  • We will build our model using the DecisionTreeClassifier function. Using default 'gini' criteria to split.
  • If the frequency of class A is 10% and the frequency of class B is 90%, then class B will become the dominant class and the decision tree will become biased toward the dominant classes.

  • In this case, we can pass a dictionary {0:0.15,1:0.85} to the model to specify the weight of each class and the decision tree will give more weightage to class 1.

  • class_weight is a hyperparameter for the decision tree classifier.

In [69]:
model = DecisionTreeClassifier(criterion='gini',class_weight={0:0.15,1:0.85},random_state=1)
In [70]:
model.fit(X_train, y_train)
Out[70]:
DecisionTreeClassifier(class_weight={0: 0.15, 1: 0.85}, random_state=1)
In [71]:
def make_confusion_matrix(model,y_actual,labels=[1, 0]):
    '''
    model : classifier to predict values of X
    y_actual : ground truth  
    
    '''
    y_predict = model.predict(X_test)
    cm=metrics.confusion_matrix( y_actual, y_predict, labels=[0, 1])
    df_cm = pd.DataFrame(cm, index = [i for i in ["Actual - No","Actual - Yes"]],
                  columns = [i for i in ['Predicted - No','Predicted - Yes']])
    group_counts = ["{0:0.0f}".format(value) for value in
                cm.flatten()]
    group_percentages = ["{0:.2%}".format(value) for value in
                         cm.flatten()/np.sum(cm)]
    labels = [f"{v1}\n{v2}" for v1, v2 in
              zip(group_counts,group_percentages)]
    labels = np.asarray(labels).reshape(2,2)
    plt.figure(figsize = (10,7))
    sns.heatmap(df_cm, annot=labels,fmt='')
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
In [72]:
make_confusion_matrix(model,y_test)
In [73]:
y_train.value_counts(1)
Out[73]:
0    0.845488
1    0.154512
Name: Revenue, dtype: float64

We only have 15% of positive classes, so if our model marks each sample as negative, then also we'll get 84% accuracy, hence accuracy is not a good metric to evaluate here.

Insights:

  • True Positives:

    • Reality: A customer made a purchase.
    • Model predicted: The customer will contribute to revenue.
    • Outcome: The model is good.
  • True Negatives:

    • Reality: A customer did NOT make a purchase.
    • Model predicted: The customer will NOT contribute to revenue.
    • Outcome: The business is unaffected.
  • False Positives:

    • Reality: A customer did NOT make a purchase.
    • Model predicted: The customer will contribute to revenue.
    • Outcome: The team which is targeting the potential customers will be wasting their resources on the people/customers who will not be contributing to the revenue.
  • False Negatives:

    • Reality: A customer made a purchase.
    • Model predicted: The customer will NOT contribute to revenue.
    • Outcome: The potential customer is missed by the sales/marketing team, the team could have offered the potential customer some discount or loyalty card to make the customer come again to purchase. (Customer retention will get affected.)
  • In this case, not being able to identify a potential customer is the biggest loss we can face. Hence, recall is the right metric to check the performance of the model.
In [74]:
##  Function to calculate recall score
def get_recall_score(model):
    '''
    model : classifier to predict values of X

    '''
    pred_train = model.predict(X_train)
    pred_test = model.predict(X_test)
    print("Recall on training set : ",metrics.recall_score(y_train,pred_train))
    print("Recall on test set : ",metrics.recall_score(y_test,pred_test))
In [76]:
get_recall_score(model)
Recall on training set :  1.0
Recall on test set :  0.31462585034013607
  • There is a huge disparity in performance of model on training set and test set, which suggests that the model is overfiiting.

Visualizing the Decision Tree

In [77]:
plt.figure(figsize=(20,30))
out = tree.plot_tree(model,feature_names=feature_names,filled=True,fontsize=9,node_ids=False,class_names=None,)
#below code will add arrows to the decision tree split if they are missing
for o in out:
     arrow = o.arrow_patch
     if arrow is not None:
        arrow.set_edgecolor('black')
        arrow.set_linewidth(1)
plt.show()
In [78]:
# Text report showing the rules of a decision tree -

print(tree.export_text(model,feature_names=feature_names,show_weights=True))
|--- ExitRates <= 0.04
|   |--- ProductRelated_Duration <= 315.08
|   |   |--- Month_May <= 0.50
|   |   |   |--- Month_Mar <= 0.50
|   |   |   |   |--- Month_Dec <= 0.50
|   |   |   |   |   |--- ExitRates <= 0.00
|   |   |   |   |   |   |--- Administrative_Duration <= 66.90
|   |   |   |   |   |   |   |--- Browser_4 <= 0.50
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 11.80
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  11.80
|   |   |   |   |   |   |   |   |   |--- Informational <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- Month_June <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- Month_June >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Informational >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- Browser_4 >  0.50
|   |   |   |   |   |   |   |   |--- ProductRelated <= 8.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ProductRelated >  8.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |--- Administrative_Duration >  66.90
|   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |--- ExitRates >  0.00
|   |   |   |   |   |   |--- ProductRelated_Duration <= 163.21
|   |   |   |   |   |   |   |--- Region_9 <= 0.50
|   |   |   |   |   |   |   |   |--- Month_Nov <= 0.50
|   |   |   |   |   |   |   |   |   |--- Month_Feb <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- Administrative_Duration <= 223.58
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 14
|   |   |   |   |   |   |   |   |   |   |--- Administrative_Duration >  223.58
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Month_Feb >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Month_Nov >  0.50
|   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 26.60
|   |   |   |   |   |   |   |   |   |   |--- Region_4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- Region_4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  26.60
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.03
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.03
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.95, 0.00] class: 0
|   |   |   |   |   |   |   |--- Region_9 >  0.50
|   |   |   |   |   |   |   |   |--- Informational <= 1.00
|   |   |   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Informational >  1.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |--- ProductRelated_Duration >  163.21
|   |   |   |   |   |   |   |--- ExitRates <= 0.04
|   |   |   |   |   |   |   |   |--- ExitRates <= 0.01
|   |   |   |   |   |   |   |   |   |--- VisitorType_Returning_Visitor <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- Region_7 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- Region_7 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- VisitorType_Returning_Visitor >  0.50
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.01
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.55, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.01
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |--- ExitRates >  0.01
|   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.03
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.02
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 12
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.02
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |--- ExitRates >  0.03
|   |   |   |   |   |   |   |   |   |   |--- Administrative <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- Administrative >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |--- ExitRates >  0.04
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 237.38
|   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.03
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.35, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- BounceRates >  0.03
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  237.38
|   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.00
|   |   |   |   |   |   |   |   |   |   |--- Month_Jul <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- Month_Jul >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- BounceRates >  0.00
|   |   |   |   |   |   |   |   |   |   |--- OperatingSystems_2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- OperatingSystems_2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |--- Month_Dec >  0.50
|   |   |   |   |   |--- OperatingSystems_8 <= 0.50
|   |   |   |   |   |   |--- ProductRelated_Duration <= 205.22
|   |   |   |   |   |   |   |--- Region_6 <= 0.50
|   |   |   |   |   |   |   |   |--- OperatingSystems_3 <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [12.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- OperatingSystems_3 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [3.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- Region_6 >  0.50
|   |   |   |   |   |   |   |   |--- Browser_4 <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Browser_4 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |--- ProductRelated_Duration >  205.22
|   |   |   |   |   |   |   |--- Administrative_Duration <= 75.50
|   |   |   |   |   |   |   |   |--- ProductRelated <= 11.50
|   |   |   |   |   |   |   |   |   |--- Region_3 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.01
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |   |--- BounceRates >  0.01
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- Region_3 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.02
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.02
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ProductRelated >  11.50
|   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.01
|   |   |   |   |   |   |   |   |   |   |--- Administrative <= 2.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- Administrative >  2.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- ExitRates >  0.01
|   |   |   |   |   |   |   |   |   |   |--- Region_3 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.70, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- Region_3 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |--- Administrative_Duration >  75.50
|   |   |   |   |   |   |   |   |--- weights: [2.70, 0.00] class: 0
|   |   |   |   |   |--- OperatingSystems_8 >  0.50
|   |   |   |   |   |   |--- ProductRelated_Duration <= 157.46
|   |   |   |   |   |   |   |--- ProductRelated_Duration <= 58.25
|   |   |   |   |   |   |   |   |--- ProductRelated <= 3.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ProductRelated >  3.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |--- ProductRelated_Duration >  58.25
|   |   |   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |   |   |   |--- ProductRelated_Duration >  157.46
|   |   |   |   |   |   |   |--- ProductRelated <= 13.00
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 247.44
|   |   |   |   |   |   |   |   |   |--- Administrative_Duration <= 18.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Administrative_Duration >  18.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  247.44
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |   |--- ProductRelated >  13.00
|   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |--- Month_Mar >  0.50
|   |   |   |   |--- VisitorType_Returning_Visitor <= 0.50
|   |   |   |   |   |--- ExitRates <= 0.02
|   |   |   |   |   |   |--- Browser_2 <= 0.50
|   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |--- Browser_2 >  0.50
|   |   |   |   |   |   |   |--- weights: [4.35, 0.00] class: 0
|   |   |   |   |   |--- ExitRates >  0.02
|   |   |   |   |   |   |--- ProductRelated <= 6.50
|   |   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |   |   |   |--- ProductRelated >  6.50
|   |   |   |   |   |   |   |--- TrafficType <= 6.00
|   |   |   |   |   |   |   |   |--- Region_6 <= 0.50
|   |   |   |   |   |   |   |   |   |--- ProductRelated <= 8.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- ProductRelated >  8.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.55] class: 1
|   |   |   |   |   |   |   |   |--- Region_6 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- TrafficType >  6.00
|   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |--- VisitorType_Returning_Visitor >  0.50
|   |   |   |   |   |--- Administrative <= 3.50
|   |   |   |   |   |   |--- weights: [21.60, 0.00] class: 0
|   |   |   |   |   |--- Administrative >  3.50
|   |   |   |   |   |   |--- Administrative_Duration <= 21.50
|   |   |   |   |   |   |   |--- TrafficType <= 2.50
|   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- TrafficType >  2.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |--- Administrative_Duration >  21.50
|   |   |   |   |   |   |   |--- Administrative <= 4.50
|   |   |   |   |   |   |   |   |--- weights: [1.50, 0.00] class: 0
|   |   |   |   |   |   |   |--- Administrative >  4.50
|   |   |   |   |   |   |   |   |--- weights: [2.10, 0.00] class: 0
|   |   |--- Month_May >  0.50
|   |   |   |--- ProductRelated_Duration <= 239.88
|   |   |   |   |--- TrafficType <= 2.50
|   |   |   |   |   |--- ProductRelated_Duration <= 151.88
|   |   |   |   |   |   |--- ProductRelated_Duration <= 148.00
|   |   |   |   |   |   |   |--- VisitorType_Returning_Visitor <= 0.50
|   |   |   |   |   |   |   |   |--- Administrative_Duration <= 19.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- Administrative_Duration >  19.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |--- VisitorType_Returning_Visitor >  0.50
|   |   |   |   |   |   |   |   |--- Administrative <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [2.40, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Administrative >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.65, 0.00] class: 0
|   |   |   |   |   |   |--- ProductRelated_Duration >  148.00
|   |   |   |   |   |   |   |--- ExitRates <= 0.02
|   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |--- ExitRates >  0.02
|   |   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |--- ProductRelated_Duration >  151.88
|   |   |   |   |   |   |--- weights: [7.95, 0.00] class: 0
|   |   |   |   |--- TrafficType >  2.50
|   |   |   |   |   |--- weights: [20.85, 0.00] class: 0
|   |   |   |--- ProductRelated_Duration >  239.88
|   |   |   |   |--- ProductRelated_Duration <= 246.71
|   |   |   |   |   |--- ExitRates <= 0.03
|   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |--- ExitRates >  0.03
|   |   |   |   |   |   |--- Administrative_Duration <= 29.00
|   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |--- Administrative_Duration >  29.00
|   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |--- ProductRelated_Duration >  246.71
|   |   |   |   |   |--- Administrative_Duration <= 31.00
|   |   |   |   |   |   |--- weights: [6.45, 0.00] class: 0
|   |   |   |   |   |--- Administrative_Duration >  31.00
|   |   |   |   |   |   |--- Administrative_Duration <= 78.75
|   |   |   |   |   |   |   |--- ProductRelated <= 9.50
|   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |--- ProductRelated >  9.50
|   |   |   |   |   |   |   |   |--- ExitRates <= 0.01
|   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ExitRates >  0.01
|   |   |   |   |   |   |   |   |   |--- Informational <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- Region_9 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- Region_9 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Informational >  1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |--- Administrative_Duration >  78.75
|   |   |   |   |   |   |   |--- OperatingSystems_3 <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [2.25, 0.00] class: 0
|   |   |   |   |   |   |   |--- OperatingSystems_3 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |--- ProductRelated_Duration >  315.08
|   |   |--- Month_Nov <= 0.50
|   |   |   |--- BounceRates <= 0.00
|   |   |   |   |--- VisitorType_Returning_Visitor <= 0.50
|   |   |   |   |   |--- Administrative <= 0.50
|   |   |   |   |   |   |--- TrafficType <= 5.50
|   |   |   |   |   |   |   |--- Region_7 <= 0.50
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 417.60
|   |   |   |   |   |   |   |   |   |--- Weekend_True <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 412.92
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  412.92
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- Weekend_True >  0.50
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 320.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  320.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  417.60
|   |   |   |   |   |   |   |   |   |--- Month_Dec <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- Browser_3 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 12
|   |   |   |   |   |   |   |   |   |   |--- Browser_3 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Month_Dec >  0.50
|   |   |   |   |   |   |   |   |   |   |--- Region_6 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |   |--- Region_6 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |--- Region_7 >  0.50
|   |   |   |   |   |   |   |   |--- ExitRates <= 0.01
|   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- ExitRates >  0.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- ExitRates >  0.01
|   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |--- TrafficType >  5.50
|   |   |   |   |   |   |   |--- ProductRelated_Duration <= 353.53
|   |   |   |   |   |   |   |   |--- OperatingSystems_3 <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |   |   |--- OperatingSystems_3 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- ProductRelated_Duration >  353.53
|   |   |   |   |   |   |   |   |--- ExitRates <= 0.01
|   |   |   |   |   |   |   |   |   |--- Region_3 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.01
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.01
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- Region_3 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ExitRates >  0.01
|   |   |   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |--- Administrative >  0.50
|   |   |   |   |   |   |--- ProductRelated_Duration <= 672.77
|   |   |   |   |   |   |   |--- Region_3 <= 0.50
|   |   |   |   |   |   |   |   |--- Region_7 <= 0.50
|   |   |   |   |   |   |   |   |   |--- ProductRelated <= 27.50
|   |   |   |   |   |   |   |   |   |   |--- Month_Mar <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 20
|   |   |   |   |   |   |   |   |   |   |--- Month_Mar >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |--- ProductRelated >  27.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.50, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Region_7 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [2.10, 0.00] class: 0
|   |   |   |   |   |   |   |--- Region_3 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [3.60, 0.00] class: 0
|   |   |   |   |   |   |--- ProductRelated_Duration >  672.77
|   |   |   |   |   |   |   |--- Administrative <= 8.50
|   |   |   |   |   |   |   |   |--- Administrative_Duration <= 7.55
|   |   |   |   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Administrative_Duration >  7.55
|   |   |   |   |   |   |   |   |   |--- ProductRelated <= 12.50
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 699.05
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  699.05
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- ProductRelated >  12.50
|   |   |   |   |   |   |   |   |   |   |--- Administrative_Duration <= 808.75
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 15
|   |   |   |   |   |   |   |   |   |   |--- Administrative_Duration >  808.75
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |--- Administrative >  8.50
|   |   |   |   |   |   |   |   |--- Weekend_True <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Weekend_True >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |--- VisitorType_Returning_Visitor >  0.50
|   |   |   |   |   |--- SpecialDay_0.8 <= 0.50
|   |   |   |   |   |   |--- ProductRelated <= 7.50
|   |   |   |   |   |   |   |--- ProductRelated <= 3.50
|   |   |   |   |   |   |   |   |--- Administrative <= 2.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Administrative >  2.50
|   |   |   |   |   |   |   |   |   |--- Administrative <= 3.50
|   |   |   |   |   |   |   |   |   |   |--- Administrative_Duration <= 118.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |   |--- Administrative_Duration >  118.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- Administrative >  3.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- ProductRelated >  3.50
|   |   |   |   |   |   |   |   |--- Region_9 <= 0.50
|   |   |   |   |   |   |   |   |   |--- Month_Dec <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- TrafficType <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- TrafficType >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [5.10, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Month_Dec >  0.50
|   |   |   |   |   |   |   |   |   |   |--- OperatingSystems_2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- OperatingSystems_2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- Region_9 >  0.50
|   |   |   |   |   |   |   |   |   |--- Month_Mar <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Month_Mar >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |--- ProductRelated >  7.50
|   |   |   |   |   |   |   |--- Informational <= 0.50
|   |   |   |   |   |   |   |   |--- SpecialDay_0.2 <= 0.50
|   |   |   |   |   |   |   |   |   |--- Month_Mar <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- SpecialDay_0.4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 26
|   |   |   |   |   |   |   |   |   |   |--- SpecialDay_0.4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.35, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Month_Mar >  0.50
|   |   |   |   |   |   |   |   |   |   |--- Administrative_Duration <= 53.88
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 16
|   |   |   |   |   |   |   |   |   |   |--- Administrative_Duration >  53.88
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 14
|   |   |   |   |   |   |   |   |--- SpecialDay_0.2 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [2.25, 0.00] class: 0
|   |   |   |   |   |   |   |--- Informational >  0.50
|   |   |   |   |   |   |   |   |--- Month_June <= 0.50
|   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- ExitRates >  0.00
|   |   |   |   |   |   |   |   |   |   |--- TrafficType <= 4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 21
|   |   |   |   |   |   |   |   |   |   |--- TrafficType >  4.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |--- Month_June >  0.50
|   |   |   |   |   |   |   |   |   |--- Region_3 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Region_3 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |--- SpecialDay_0.8 >  0.50
|   |   |   |   |   |   |--- Region_3 <= 0.50
|   |   |   |   |   |   |   |--- OperatingSystems_2 <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [1.80, 0.00] class: 0
|   |   |   |   |   |   |   |--- OperatingSystems_2 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [2.55, 0.00] class: 0
|   |   |   |   |   |   |--- Region_3 >  0.50
|   |   |   |   |   |   |   |--- OperatingSystems_2 <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- OperatingSystems_2 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |--- BounceRates >  0.00
|   |   |   |   |--- Administrative <= 0.50
|   |   |   |   |   |--- ExitRates <= 0.02
|   |   |   |   |   |   |--- ProductRelated_Duration <= 2019.35
|   |   |   |   |   |   |   |--- ProductRelated <= 49.50
|   |   |   |   |   |   |   |   |--- Month_Mar <= 0.50
|   |   |   |   |   |   |   |   |   |--- ProductRelated <= 37.50
|   |   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- BounceRates >  0.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |--- ProductRelated >  37.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Month_Mar >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [2.70, 0.00] class: 0
|   |   |   |   |   |   |   |--- ProductRelated >  49.50
|   |   |   |   |   |   |   |   |--- ProductRelated <= 77.00
|   |   |   |   |   |   |   |   |   |--- Informational_Duration <= 129.75
|   |   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.01
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
|   |   |   |   |   |   |   |   |   |   |--- BounceRates >  0.01
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- Informational_Duration >  129.75
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated <= 57.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated >  57.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ProductRelated >  77.00
|   |   |   |   |   |   |   |   |   |--- Month_Jul <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Month_Jul >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |--- ProductRelated_Duration >  2019.35
|   |   |   |   |   |   |   |--- ProductRelated_Duration <= 5270.50
|   |   |   |   |   |   |   |   |--- ProductRelated <= 30.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ProductRelated >  30.00
|   |   |   |   |   |   |   |   |   |--- weights: [3.45, 0.00] class: 0
|   |   |   |   |   |   |   |--- ProductRelated_Duration >  5270.50
|   |   |   |   |   |   |   |   |--- OperatingSystems_2 <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- OperatingSystems_2 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- ExitRates >  0.02
|   |   |   |   |   |   |--- VisitorType_Returning_Visitor <= 0.50
|   |   |   |   |   |   |   |--- ProductRelated_Duration <= 417.52
|   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |--- ProductRelated_Duration >  417.52
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 1242.98
|   |   |   |   |   |   |   |   |   |--- Region_7 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- Region_7 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  1242.98
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |--- VisitorType_Returning_Visitor >  0.50
|   |   |   |   |   |   |   |--- Informational_Duration <= 800.40
|   |   |   |   |   |   |   |   |--- Region_3 <= 0.50
|   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 321.92
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  321.92
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 884.89
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  884.89
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |--- Region_3 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [12.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- Informational_Duration >  800.40
|   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |--- Administrative >  0.50
|   |   |   |   |   |--- Informational_Duration <= 160.75
|   |   |   |   |   |   |--- Administrative_Duration <= 2.50
|   |   |   |   |   |   |   |--- ProductRelated_Duration <= 585.97
|   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |--- ProductRelated_Duration >  585.97
|   |   |   |   |   |   |   |   |--- BounceRates <= 0.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- BounceRates >  0.00
|   |   |   |   |   |   |   |   |   |--- Region_7 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- Informational <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- Informational >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 5.10] class: 1
|   |   |   |   |   |   |   |   |   |--- Region_7 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |--- Administrative_Duration >  2.50
|   |   |   |   |   |   |   |--- ProductRelated <= 32.50
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 402.78
|   |   |   |   |   |   |   |   |   |--- Administrative_Duration <= 412.73
|   |   |   |   |   |   |   |   |   |   |--- Administrative <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- Administrative >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.65, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Administrative_Duration >  412.73
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  402.78
|   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.03
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.04
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 27
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.04
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- BounceRates >  0.03
|   |   |   |   |   |   |   |   |   |   |--- OperatingSystems_2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.55, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- OperatingSystems_2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- ProductRelated >  32.50
|   |   |   |   |   |   |   |   |--- TrafficType <= 1.50
|   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 2517.32
|   |   |   |   |   |   |   |   |   |   |--- Month_Mar <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [9.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- Month_Mar >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  2517.32
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.02
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.02
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [4.05, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- TrafficType >  1.50
|   |   |   |   |   |   |   |   |   |--- Administrative_Duration <= 11.46
|   |   |   |   |   |   |   |   |   |   |--- weights: [4.20, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Administrative_Duration >  11.46
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 6160.02
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 22
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  6160.02
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |--- Informational_Duration >  160.75
|   |   |   |   |   |   |--- ExitRates <= 0.03
|   |   |   |   |   |   |   |--- Informational_Duration <= 235.58
|   |   |   |   |   |   |   |   |--- Weekend_True <= 0.50
|   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- BounceRates >  0.00
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 452.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  452.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |--- Weekend_True >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |   |   |--- Informational_Duration >  235.58
|   |   |   |   |   |   |   |   |--- Informational <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Informational >  1.50
|   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 9319.72
|   |   |   |   |   |   |   |   |   |   |--- Administrative_Duration <= 358.02
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- Administrative_Duration >  358.02
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  9319.72
|   |   |   |   |   |   |   |   |   |   |--- weights: [1.35, 0.00] class: 0
|   |   |   |   |   |   |--- ExitRates >  0.03
|   |   |   |   |   |   |   |--- weights: [1.65, 0.00] class: 0
|   |   |--- Month_Nov >  0.50
|   |   |   |--- ExitRates <= 0.02
|   |   |   |   |--- Administrative <= 6.50
|   |   |   |   |   |--- OperatingSystems_3 <= 0.50
|   |   |   |   |   |   |--- ProductRelated_Duration <= 1122.97
|   |   |   |   |   |   |   |--- ProductRelated <= 26.50
|   |   |   |   |   |   |   |   |--- Administrative <= 5.50
|   |   |   |   |   |   |   |   |   |--- Region_8 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- OperatingSystems_2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |   |--- OperatingSystems_2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |--- Region_8 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Administrative >  5.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |--- ProductRelated >  26.50
|   |   |   |   |   |   |   |   |--- Region_6 <= 0.50
|   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.02
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated <= 52.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 14
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated >  52.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- ExitRates >  0.02
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Region_6 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |   |   |   |--- ProductRelated_Duration >  1122.97
|   |   |   |   |   |   |   |--- Administrative <= 0.50
|   |   |   |   |   |   |   |   |--- ProductRelated <= 23.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ProductRelated >  23.00
|   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.01
|   |   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |   |--- BounceRates >  0.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- ExitRates >  0.01
|   |   |   |   |   |   |   |   |   |   |--- Browser_4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 15
|   |   |   |   |   |   |   |   |   |   |--- Browser_4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |--- Administrative >  0.50
|   |   |   |   |   |   |   |   |--- TrafficType <= 12.00
|   |   |   |   |   |   |   |   |   |--- Informational_Duration <= 1105.38
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated <= 19.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated >  19.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 17
|   |   |   |   |   |   |   |   |   |--- Informational_Duration >  1105.38
|   |   |   |   |   |   |   |   |   |   |--- Browser_2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- Browser_2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- TrafficType >  12.00
|   |   |   |   |   |   |   |   |   |--- Administrative_Duration <= 33.20
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Administrative_Duration >  33.20
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |--- OperatingSystems_3 >  0.50
|   |   |   |   |   |   |--- ProductRelated <= 16.50
|   |   |   |   |   |   |   |--- weights: [2.85, 0.00] class: 0
|   |   |   |   |   |   |--- ProductRelated >  16.50
|   |   |   |   |   |   |   |--- BounceRates <= 0.01
|   |   |   |   |   |   |   |   |--- ProductRelated <= 118.00
|   |   |   |   |   |   |   |   |   |--- Browser_3 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- TrafficType <= 12.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |   |--- TrafficType >  12.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Browser_3 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ProductRelated >  118.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |--- BounceRates >  0.01
|   |   |   |   |   |   |   |   |--- weights: [2.25, 0.00] class: 0
|   |   |   |   |--- Administrative >  6.50
|   |   |   |   |   |--- ExitRates <= 0.00
|   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |--- ExitRates >  0.00
|   |   |   |   |   |   |--- ExitRates <= 0.01
|   |   |   |   |   |   |   |--- Administrative <= 7.50
|   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |--- Administrative >  7.50
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 832.45
|   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  832.45
|   |   |   |   |   |   |   |   |   |--- ProductRelated <= 61.50
|   |   |   |   |   |   |   |   |   |   |--- Browser_6 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- Browser_6 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- ProductRelated >  61.50
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |--- ExitRates >  0.01
|   |   |   |   |   |   |   |--- Region_8 <= 0.50
|   |   |   |   |   |   |   |   |--- Region_5 <= 0.50
|   |   |   |   |   |   |   |   |   |--- TrafficType <= 16.50
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.01
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 11
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.01
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 10
|   |   |   |   |   |   |   |   |   |--- TrafficType >  16.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Region_5 >  0.50
|   |   |   |   |   |   |   |   |   |--- Administrative_Duration <= 481.69
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Administrative_Duration >  481.69
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- Region_8 >  0.50
|   |   |   |   |   |   |   |   |--- ProductRelated <= 51.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ProductRelated >  51.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |--- ExitRates >  0.02
|   |   |   |   |--- ProductRelated_Duration <= 4201.59
|   |   |   |   |   |--- ProductRelated_Duration <= 3877.48
|   |   |   |   |   |   |--- ProductRelated <= 20.50
|   |   |   |   |   |   |   |--- ProductRelated_Duration <= 2060.62
|   |   |   |   |   |   |   |   |--- Weekend_True <= 0.50
|   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.02
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated <= 11.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated >  11.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- ExitRates >  0.02
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.03
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [6.75, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.03
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |--- Weekend_True >  0.50
|   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.00
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.02
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.02
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |--- BounceRates >  0.00
|   |   |   |   |   |   |   |   |   |   |--- Informational <= 1.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [2.10, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- Informational >  1.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |--- ProductRelated_Duration >  2060.62
|   |   |   |   |   |   |   |   |--- ProductRelated <= 16.50
|   |   |   |   |   |   |   |   |   |--- Administrative <= 3.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 2.55] class: 1
|   |   |   |   |   |   |   |   |   |--- Administrative >  3.00
|   |   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.01
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- BounceRates >  0.01
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- ProductRelated >  16.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |--- ProductRelated >  20.50
|   |   |   |   |   |   |   |--- Region_8 <= 0.50
|   |   |   |   |   |   |   |   |--- Administrative_Duration <= 7.56
|   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.02
|   |   |   |   |   |   |   |   |   |   |--- Browser_4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 15
|   |   |   |   |   |   |   |   |   |   |--- Browser_4 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- BounceRates >  0.02
|   |   |   |   |   |   |   |   |   |   |--- Region_3 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- Region_3 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- Administrative_Duration >  7.56
|   |   |   |   |   |   |   |   |   |--- Region_9 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- Administrative_Duration <= 576.81
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 21
|   |   |   |   |   |   |   |   |   |   |--- Administrative_Duration >  576.81
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- Region_9 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- Informational <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- Informational >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |   |   |   |--- Region_8 >  0.50
|   |   |   |   |   |   |   |   |--- BounceRates <= 0.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- BounceRates >  0.00
|   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |--- ProductRelated_Duration >  3877.48
|   |   |   |   |   |   |--- ExitRates <= 0.02
|   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |--- ExitRates >  0.02
|   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |--- ProductRelated_Duration >  4201.59
|   |   |   |   |   |--- ExitRates <= 0.02
|   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |--- ExitRates >  0.02
|   |   |   |   |   |   |--- TrafficType <= 10.50
|   |   |   |   |   |   |   |--- BounceRates <= 0.00
|   |   |   |   |   |   |   |   |--- Informational <= 1.50
|   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.00
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 6323.68
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  6323.68
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- BounceRates >  0.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Informational >  1.50
|   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- BounceRates >  0.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |--- BounceRates >  0.00
|   |   |   |   |   |   |   |   |--- Region_6 <= 0.50
|   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.02
|   |   |   |   |   |   |   |   |   |   |--- Browser_5 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 15
|   |   |   |   |   |   |   |   |   |   |--- Browser_5 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- BounceRates >  0.02
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Region_6 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |--- TrafficType >  10.50
|   |   |   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|--- ExitRates >  0.04
|   |--- Administrative <= 0.50
|   |   |--- Month_Oct <= 0.50
|   |   |   |--- ExitRates <= 0.08
|   |   |   |   |--- Informational_Duration <= 24.50
|   |   |   |   |   |--- Month_Nov <= 0.50
|   |   |   |   |   |   |--- ExitRates <= 0.08
|   |   |   |   |   |   |   |--- VisitorType_Returning_Visitor <= 0.50
|   |   |   |   |   |   |   |   |--- BounceRates <= 0.02
|   |   |   |   |   |   |   |   |   |--- Region_8 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- Month_Sep <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [3.60, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- Month_Sep >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Region_8 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- BounceRates >  0.02
|   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 153.98
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  153.98
|   |   |   |   |   |   |   |   |   |   |--- TrafficType <= 11.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 1.70] class: 1
|   |   |   |   |   |   |   |   |   |   |--- TrafficType >  11.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- VisitorType_Returning_Visitor >  0.50
|   |   |   |   |   |   |   |   |--- Region_4 <= 0.50
|   |   |   |   |   |   |   |   |   |--- Browser_10 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated <= 15.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated >  15.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |   |--- Browser_10 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated <= 3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated >  3.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Region_4 >  0.50
|   |   |   |   |   |   |   |   |   |--- TrafficType <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- OperatingSystems_2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- OperatingSystems_2 >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.20, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- TrafficType >  1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [7.20, 0.00] class: 0
|   |   |   |   |   |   |--- ExitRates >  0.08
|   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- Month_Nov >  0.50
|   |   |   |   |   |   |--- OperatingSystems_3 <= 0.50
|   |   |   |   |   |   |   |--- Weekend_True <= 0.50
|   |   |   |   |   |   |   |   |--- ExitRates <= 0.05
|   |   |   |   |   |   |   |   |   |--- weights: [8.40, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- ExitRates >  0.05
|   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.01
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 74.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.65, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  74.00
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 9
|   |   |   |   |   |   |   |   |   |--- BounceRates >  0.01
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated <= 7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated >  7.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |--- Weekend_True >  0.50
|   |   |   |   |   |   |   |   |--- BounceRates <= 0.02
|   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 34.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  34.00
|   |   |   |   |   |   |   |   |   |   |--- weights: [2.55, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- BounceRates >  0.02
|   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.06
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated <= 22.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.40] class: 1
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated >  22.00
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- ExitRates >  0.06
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |--- OperatingSystems_3 >  0.50
|   |   |   |   |   |   |   |--- weights: [4.80, 0.00] class: 0
|   |   |   |   |--- Informational_Duration >  24.50
|   |   |   |   |   |--- ProductRelated_Duration <= 1114.13
|   |   |   |   |   |   |--- Informational <= 4.00
|   |   |   |   |   |   |   |--- weights: [1.95, 0.00] class: 0
|   |   |   |   |   |   |--- Informational >  4.00
|   |   |   |   |   |   |   |--- ExitRates <= 0.05
|   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- ExitRates >  0.05
|   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- ProductRelated_Duration >  1114.13
|   |   |   |   |   |   |--- ProductRelated <= 105.50
|   |   |   |   |   |   |   |--- Month_Dec <= 0.50
|   |   |   |   |   |   |   |   |--- Month_Sep <= 0.50
|   |   |   |   |   |   |   |   |   |--- Region_4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.40] class: 1
|   |   |   |   |   |   |   |   |   |--- Region_4 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Month_Sep >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- Month_Dec >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |--- ProductRelated >  105.50
|   |   |   |   |   |   |   |--- Informational <= 3.00
|   |   |   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |   |   |--- Informational >  3.00
|   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |--- ExitRates >  0.08
|   |   |   |   |--- Browser_12 <= 0.50
|   |   |   |   |   |--- ExitRates <= 0.09
|   |   |   |   |   |   |--- ExitRates <= 0.09
|   |   |   |   |   |   |   |--- ProductRelated <= 3.50
|   |   |   |   |   |   |   |   |--- TrafficType <= 2.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- TrafficType >  2.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |--- ProductRelated >  3.50
|   |   |   |   |   |   |   |   |--- weights: [19.50, 0.00] class: 0
|   |   |   |   |   |   |--- ExitRates >  0.09
|   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- ExitRates >  0.09
|   |   |   |   |   |   |--- BounceRates <= 0.00
|   |   |   |   |   |   |   |--- ProductRelated <= 3.50
|   |   |   |   |   |   |   |   |--- weights: [24.00, 0.00] class: 0
|   |   |   |   |   |   |   |--- ProductRelated >  3.50
|   |   |   |   |   |   |   |   |--- Month_Nov <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Month_Nov >  0.50
|   |   |   |   |   |   |   |   |   |--- TrafficType <= 1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- TrafficType >  1.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |--- BounceRates >  0.00
|   |   |   |   |   |   |   |--- SpecialDay_0.2 <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [108.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- SpecialDay_0.2 >  0.50
|   |   |   |   |   |   |   |   |--- weights: [2.55, 0.00] class: 0
|   |   |   |   |--- Browser_12 >  0.50
|   |   |   |   |   |--- ProductRelated <= 7.00
|   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |   |--- ProductRelated >  7.00
|   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |--- Month_Oct >  0.50
|   |   |   |--- BounceRates <= 0.01
|   |   |   |   |--- Browser_4 <= 0.50
|   |   |   |   |   |--- ProductRelated_Duration <= 23.30
|   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |--- ProductRelated_Duration >  23.30
|   |   |   |   |   |   |--- ProductRelated_Duration <= 426.13
|   |   |   |   |   |   |   |--- ProductRelated <= 5.00
|   |   |   |   |   |   |   |   |--- weights: [0.00, 3.40] class: 1
|   |   |   |   |   |   |   |--- ProductRelated >  5.00
|   |   |   |   |   |   |   |   |--- OperatingSystems_2 <= 0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- OperatingSystems_2 >  0.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |--- ProductRelated_Duration >  426.13
|   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |--- Browser_4 >  0.50
|   |   |   |   |   |--- ExitRates <= 0.08
|   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |--- ExitRates >  0.08
|   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |--- BounceRates >  0.01
|   |   |   |   |--- BounceRates <= 0.06
|   |   |   |   |   |--- Region_6 <= 0.50
|   |   |   |   |   |   |--- weights: [2.70, 0.00] class: 0
|   |   |   |   |   |--- Region_6 >  0.50
|   |   |   |   |   |   |--- weights: [0.30, 0.00] class: 0
|   |   |   |   |--- BounceRates >  0.06
|   |   |   |   |   |--- Browser_2 <= 0.50
|   |   |   |   |   |   |--- Region_8 <= 0.50
|   |   |   |   |   |   |   |--- VisitorType_Returning_Visitor <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- VisitorType_Returning_Visitor >  0.50
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 7.10
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  7.10
|   |   |   |   |   |   |   |   |   |--- Region_4 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Region_4 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |--- Region_8 >  0.50
|   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |--- Browser_2 >  0.50
|   |   |   |   |   |   |--- weights: [1.05, 0.00] class: 0
|   |--- Administrative >  0.50
|   |   |--- Month_May <= 0.50
|   |   |   |--- Informational <= 2.50
|   |   |   |   |--- TrafficType <= 17.50
|   |   |   |   |   |--- BounceRates <= 0.01
|   |   |   |   |   |   |--- TrafficType <= 5.50
|   |   |   |   |   |   |   |--- BounceRates <= 0.01
|   |   |   |   |   |   |   |   |--- Administrative_Duration <= 109.68
|   |   |   |   |   |   |   |   |   |--- Administrative_Duration <= 38.04
|   |   |   |   |   |   |   |   |   |   |--- Weekend_True <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |   |   |   |--- Weekend_True >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
|   |   |   |   |   |   |   |   |   |--- Administrative_Duration >  38.04
|   |   |   |   |   |   |   |   |   |   |--- VisitorType_Returning_Visitor <= 0.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- VisitorType_Returning_Visitor >  0.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 7
|   |   |   |   |   |   |   |   |--- Administrative_Duration >  109.68
|   |   |   |   |   |   |   |   |   |--- weights: [2.25, 0.00] class: 0
|   |   |   |   |   |   |   |--- BounceRates >  0.01
|   |   |   |   |   |   |   |   |--- Administrative <= 1.50
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Administrative >  1.50
|   |   |   |   |   |   |   |   |   |--- Browser_3 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration <= 2587.58
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 5.10] class: 1
|   |   |   |   |   |   |   |   |   |   |--- ProductRelated_Duration >  2587.58
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Browser_3 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |--- TrafficType >  5.50
|   |   |   |   |   |   |   |--- weights: [2.85, 0.00] class: 0
|   |   |   |   |   |--- BounceRates >  0.01
|   |   |   |   |   |   |--- BounceRates <= 0.04
|   |   |   |   |   |   |   |--- Browser_8 <= 0.50
|   |   |   |   |   |   |   |   |--- Month_Jul <= 0.50
|   |   |   |   |   |   |   |   |   |--- Region_2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.07
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [16.80, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.07
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |   |--- Region_2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.05
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [1.65, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.05
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- Month_Jul >  0.50
|   |   |   |   |   |   |   |   |   |--- Browser_2 <= 0.50
|   |   |   |   |   |   |   |   |   |   |--- weights: [0.90, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |--- Browser_2 >  0.50
|   |   |   |   |   |   |   |   |   |   |--- Administrative <= 1.50
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |   |   |   |   |   |--- Administrative >  1.50
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
|   |   |   |   |   |   |   |--- Browser_8 >  0.50
|   |   |   |   |   |   |   |   |--- Administrative_Duration <= 134.10
|   |   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |   |--- Administrative_Duration >  134.10
|   |   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |--- BounceRates >  0.04
|   |   |   |   |   |   |   |--- Month_Oct <= 0.50
|   |   |   |   |   |   |   |   |--- Administrative <= 4.50
|   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.07
|   |   |   |   |   |   |   |   |   |   |--- ExitRates <= 0.07
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 8
|   |   |   |   |   |   |   |   |   |   |--- ExitRates >  0.07
|   |   |   |   |   |   |   |   |   |   |   |--- weights: [0.00, 3.40] class: 1
|   |   |   |   |   |   |   |   |   |--- ExitRates >  0.07
|   |   |   |   |   |   |   |   |   |   |--- BounceRates <= 0.11
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
|   |   |   |   |   |   |   |   |   |   |--- BounceRates >  0.11
|   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
|   |   |   |   |   |   |   |   |--- Administrative >  4.50
|   |   |   |   |   |   |   |   |   |--- weights: [1.50, 0.00] class: 0
|   |   |   |   |   |   |   |--- Month_Oct >  0.50
|   |   |   |   |   |   |   |   |--- weights: [2.10, 0.00] class: 0
|   |   |   |   |--- TrafficType >  17.50
|   |   |   |   |   |--- Administrative_Duration <= 47.98
|   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |--- Administrative_Duration >  47.98
|   |   |   |   |   |   |--- weights: [0.00, 2.55] class: 1
|   |   |   |--- Informational >  2.50
|   |   |   |   |--- Administrative <= 1.50
|   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|   |   |   |   |--- Administrative >  1.50
|   |   |   |   |   |--- weights: [3.30, 0.00] class: 0
|   |   |--- Month_May >  0.50
|   |   |   |--- Informational <= 2.50
|   |   |   |   |--- Administrative_Duration <= 1.00
|   |   |   |   |   |--- BounceRates <= 0.04
|   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- BounceRates >  0.04
|   |   |   |   |   |   |--- weights: [1.80, 0.00] class: 0
|   |   |   |   |--- Administrative_Duration >  1.00
|   |   |   |   |   |--- BounceRates <= 0.01
|   |   |   |   |   |   |--- BounceRates <= 0.01
|   |   |   |   |   |   |   |--- weights: [2.70, 0.00] class: 0
|   |   |   |   |   |   |--- BounceRates >  0.01
|   |   |   |   |   |   |   |--- Informational <= 0.50
|   |   |   |   |   |   |   |   |--- weights: [0.15, 0.00] class: 0
|   |   |   |   |   |   |   |--- Informational >  0.50
|   |   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |--- BounceRates >  0.01
|   |   |   |   |   |   |--- weights: [16.20, 0.00] class: 0
|   |   |   |--- Informational >  2.50
|   |   |   |   |--- Informational_Duration <= 81.75
|   |   |   |   |   |--- ProductRelated <= 22.50
|   |   |   |   |   |   |--- weights: [0.45, 0.00] class: 0
|   |   |   |   |   |--- ProductRelated >  22.50
|   |   |   |   |   |   |--- Administrative_Duration <= 70.67
|   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |   |   |--- Administrative_Duration >  70.67
|   |   |   |   |   |   |   |--- weights: [0.00, 0.85] class: 1
|   |   |   |   |--- Informational_Duration >  81.75
|   |   |   |   |   |--- weights: [1.65, 0.00] class: 0

In [79]:
# importance of features in the tree building ( The importance of a feature is computed as the 
#(normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance )

print (pd.DataFrame(model.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
                                        Imp
ExitRates                      2.416569e-01
ProductRelated_Duration        1.465191e-01
ProductRelated                 9.449226e-02
BounceRates                    7.290181e-02
Administrative_Duration        6.835291e-02
Administrative                 5.613283e-02
TrafficType                    4.425661e-02
Informational_Duration         2.540896e-02
Month_Nov                      2.122725e-02
Informational                  1.903085e-02
Month_Mar                      1.606716e-02
OperatingSystems_3             1.504970e-02
Weekend_True                   1.461695e-02
Month_May                      1.332757e-02
VisitorType_Returning_Visitor  1.262687e-02
Region_3                       1.121948e-02
Month_Dec                      1.003689e-02
Browser_4                      9.214118e-03
OperatingSystems_2             9.191157e-03
Region_4                       8.810003e-03
Month_Oct                      7.483384e-03
Region_6                       7.371024e-03
Region_9                       7.332440e-03
Region_8                       6.962631e-03
Browser_2                      6.922846e-03
Month_Sep                      5.350803e-03
Region_7                       5.312785e-03
Region_2                       5.048876e-03
Month_June                     4.625699e-03
Region_5                       3.991111e-03
Browser_6                      3.505511e-03
Month_Jul                      3.232088e-03
OperatingSystems_4             2.911298e-03
SpecialDay_0.6                 2.470922e-03
Browser_5                      2.313052e-03
SpecialDay_0.8                 2.310706e-03
OperatingSystems_8             2.281109e-03
SpecialDay_0.2                 1.981349e-03
Browser_8                      1.698666e-03
Month_Feb                      1.425918e-03
Browser_3                      1.349077e-03
Browser_10                     1.297763e-03
Browser_12                     1.082846e-03
SpecialDay_0.4                 9.984487e-04
Browser_7                      3.843297e-04
VisitorType_Other              2.159655e-04
SpecialDay_1.0                 4.305383e-18
OperatingSystems_7             0.000000e+00
OperatingSystems_6             0.000000e+00
OperatingSystems_5             0.000000e+00
Browser_11                     0.000000e+00
Browser_9                      0.000000e+00
Browser_13                     0.000000e+00
In [80]:
importances = model.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
  • According to the decision tree model, PageValues is the most important variable for predicting the Revenue.

The tree above is very complex and difficult to interpret.

Reducing over fitting

Using GridSearch for Hyperparameter tuning of our tree model

  • Hyperparameter tuning is also tricky in the sense that there is no direct way to calculate how a change in the hyperparameter value will reduce the loss of your model, so we usually resort to experimentation. i.e we'll use Grid search
  • Grid search is a tuning technique that attempts to compute the optimum values of hyperparameters.
  • It is an exhaustive search that is performed on a the specific parameter values of a model.
  • The parameters of the estimator/model used to apply these methods are optimized by cross-validated grid-search over a parameter grid.
In [81]:
from sklearn.model_selection import GridSearchCV
In [82]:
# Choose the type of classifier. 
estimator = DecisionTreeClassifier(random_state=1,class_weight = {0:.15,1:.85})

# Grid of parameters to choose from
parameters = {
            'max_depth': np.arange(1,10),
            'criterion': ['entropy','gini'],
            'splitter': ['best','random'],
            'min_impurity_decrease': [0.000001,0.00001,0.0001],
            'max_features': ['log2','sqrt']
             }

# Type of scoring used to compare parameter combinations
scorer = metrics.make_scorer(metrics.recall_score)

# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=scorer,cv=5)
grid_obj = grid_obj.fit(X_train, y_train)

# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_

# Fit the best algorithm to the data. 
estimator.fit(X_train, y_train)
Out[82]:
DecisionTreeClassifier(class_weight={0: 0.15, 1: 0.85}, max_depth=5,
                       max_features='sqrt', min_impurity_decrease=1e-05,
                       random_state=1)
In [83]:
make_confusion_matrix(estimator,y_test)
In [84]:
get_recall_score(estimator)
Recall on training set :  0.8393939393939394
Recall on test set :  0.8112244897959183

Recall has improved for both train and test set after hyperparameter tuning and we have a generalized model.

Visualizing the Decision Tree

In [85]:
plt.figure(figsize=(15,10))
out = tree.plot_tree(estimator,feature_names=feature_names,filled=True,fontsize=9,node_ids=False,class_names=None)
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor('black')
        arrow.set_linewidth(1)
plt.show()
In [86]:
# Text report showing the rules of a decision tree -

print(tree.export_text(estimator,feature_names=feature_names,show_weights=True))
|--- Month_Nov <= 0.50
|   |--- Administrative_Duration <= 8.05
|   |   |--- Administrative <= 0.50
|   |   |   |--- BounceRates <= 0.01
|   |   |   |   |--- ProductRelated <= 6.50
|   |   |   |   |   |--- weights: [80.85, 14.45] class: 0
|   |   |   |   |--- ProductRelated >  6.50
|   |   |   |   |   |--- weights: [126.75, 146.20] class: 1
|   |   |   |--- BounceRates >  0.01
|   |   |   |   |--- ProductRelated <= 5.50
|   |   |   |   |   |--- weights: [76.50, 0.85] class: 0
|   |   |   |   |--- ProductRelated >  5.50
|   |   |   |   |   |--- weights: [134.55, 27.20] class: 0
|   |   |--- Administrative >  0.50
|   |   |   |--- Browser_6 <= 0.50
|   |   |   |   |--- Month_Oct <= 0.50
|   |   |   |   |   |--- weights: [28.50, 22.10] class: 0
|   |   |   |   |--- Month_Oct >  0.50
|   |   |   |   |   |--- weights: [1.35, 6.80] class: 1
|   |   |   |--- Browser_6 >  0.50
|   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |--- Administrative_Duration >  8.05
|   |   |--- ProductRelated_Duration <= 393.20
|   |   |   |--- Month_Mar <= 0.50
|   |   |   |   |--- Browser_3 <= 0.50
|   |   |   |   |   |--- weights: [89.40, 70.55] class: 0
|   |   |   |   |--- Browser_3 >  0.50
|   |   |   |   |   |--- weights: [0.60, 0.00] class: 0
|   |   |   |--- Month_Mar >  0.50
|   |   |   |   |--- ProductRelated_Duration <= 180.92
|   |   |   |   |   |--- weights: [9.75, 0.00] class: 0
|   |   |   |   |--- ProductRelated_Duration >  180.92
|   |   |   |   |   |--- weights: [11.85, 5.95] class: 0
|   |   |--- ProductRelated_Duration >  393.20
|   |   |   |--- Region_9 <= 0.50
|   |   |   |   |--- SpecialDay_0.4 <= 0.50
|   |   |   |   |   |--- weights: [272.85, 366.35] class: 1
|   |   |   |   |--- SpecialDay_0.4 >  0.50
|   |   |   |   |   |--- weights: [4.80, 3.40] class: 0
|   |   |   |--- Region_9 >  0.50
|   |   |   |   |--- BounceRates <= 0.02
|   |   |   |   |   |--- weights: [9.00, 19.55] class: 1
|   |   |   |   |--- BounceRates >  0.02
|   |   |   |   |   |--- weights: [0.75, 0.00] class: 0
|--- Month_Nov >  0.50
|   |--- Administrative <= 0.50
|   |   |--- Weekend_True <= 0.50
|   |   |   |--- ProductRelated <= 5.50
|   |   |   |   |--- BounceRates <= 0.07
|   |   |   |   |   |--- weights: [15.00, 5.10] class: 0
|   |   |   |   |--- BounceRates >  0.07
|   |   |   |   |   |--- weights: [14.10, 0.00] class: 0
|   |   |   |--- ProductRelated >  5.50
|   |   |   |   |--- BounceRates <= 0.01
|   |   |   |   |   |--- weights: [40.05, 60.35] class: 1
|   |   |   |   |--- BounceRates >  0.01
|   |   |   |   |   |--- weights: [22.05, 11.05] class: 0
|   |   |--- Weekend_True >  0.50
|   |   |   |--- OperatingSystems_3 <= 0.50
|   |   |   |   |--- ProductRelated_Duration <= 337.12
|   |   |   |   |   |--- weights: [7.95, 3.40] class: 0
|   |   |   |   |--- ProductRelated_Duration >  337.12
|   |   |   |   |   |--- weights: [12.90, 30.60] class: 1
|   |   |   |--- OperatingSystems_3 >  0.50
|   |   |   |   |--- ProductRelated_Duration <= 286.85
|   |   |   |   |   |--- weights: [4.20, 0.00] class: 0
|   |   |   |   |--- ProductRelated_Duration >  286.85
|   |   |   |   |   |--- weights: [5.10, 5.10] class: 0
|   |--- Administrative >  0.50
|   |   |--- Browser_3 <= 0.50
|   |   |   |--- ProductRelated <= 25.50
|   |   |   |   |--- BounceRates <= 0.01
|   |   |   |   |   |--- weights: [31.65, 51.00] class: 1
|   |   |   |   |--- BounceRates >  0.01
|   |   |   |   |   |--- weights: [15.45, 11.05] class: 0
|   |   |   |--- ProductRelated >  25.50
|   |   |   |   |--- Region_5 <= 0.50
|   |   |   |   |   |--- weights: [64.20, 258.40] class: 1
|   |   |   |   |--- Region_5 >  0.50
|   |   |   |   |   |--- weights: [1.65, 2.55] class: 1
|   |   |--- Browser_3 >  0.50
|   |   |   |--- weights: [1.05, 0.00] class: 0

In [87]:
# importance of features in the tree building ( The importance of a feature is computed as the 
#(normalized) total reduction of the 'criterion' brought by that feature. It is also known as the Gini importance )

print (pd.DataFrame(estimator.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))

#Here we will see that importance of features has increased
                                    Imp
ProductRelated                 0.204081
Month_Nov                      0.198957
BounceRates                    0.184943
Administrative_Duration        0.172282
Administrative                 0.115608
ProductRelated_Duration        0.072127
Month_Mar                      0.011929
Month_Oct                      0.011273
OperatingSystems_3             0.008181
Browser_3                      0.007023
Weekend_True                   0.005856
Region_9                       0.002630
SpecialDay_0.4                 0.002066
Region_5                       0.001583
Browser_6                      0.001460
SpecialDay_0.6                 0.000000
OperatingSystems_5             0.000000
OperatingSystems_7             0.000000
Browser_7                      0.000000
OperatingSystems_6             0.000000
Browser_8                      0.000000
SpecialDay_0.8                 0.000000
Browser_9                      0.000000
Browser_10                     0.000000
Browser_5                      0.000000
Browser_11                     0.000000
Browser_12                     0.000000
OperatingSystems_8             0.000000
Browser_13                     0.000000
OperatingSystems_2             0.000000
OperatingSystems_4             0.000000
SpecialDay_0.2                 0.000000
Region_7                       0.000000
Browser_4                      0.000000
Browser_2                      0.000000
Informational                  0.000000
Informational_Duration         0.000000
ExitRates                      0.000000
TrafficType                    0.000000
Month_Dec                      0.000000
Month_Feb                      0.000000
Month_Jul                      0.000000
Month_June                     0.000000
Month_May                      0.000000
Month_Sep                      0.000000
VisitorType_Other              0.000000
VisitorType_Returning_Visitor  0.000000
Region_2                       0.000000
Region_3                       0.000000
Region_4                       0.000000
Region_6                       0.000000
Region_8                       0.000000
SpecialDay_1.0                 0.000000
In [88]:
importances = estimator.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()

Cost Complexity Pruning

The DecisionTreeClassifier provides parameters such as min_samples_leaf and max_depth to prevent a tree from overfiting. Cost complexity pruning provides another option to control the size of a tree. In DecisionTreeClassifier, this pruning technique is parameterized by the cost complexity parameter, ccp_alpha. Greater values of ccp_alpha increase the number of nodes pruned. Here we only show the effect of ccp_alpha on regularizing the trees and how to choose a ccp_alpha based on validation scores.

Total impurity of leaves vs effective alphas of pruned tree

Minimal cost complexity pruning recursively finds the node with the "weakest link". The weakest link is characterized by an effective alpha, where the nodes with the smallest effective alpha are pruned first. To get an idea of what values of ccp_alpha could be appropriate, scikit-learn provides DecisionTreeClassifier.cost_complexity_pruning_path that returns the effective alphas and the corresponding total leaf impurities at each step of the pruning process. As alpha increases, more of the tree is pruned, which increases the total impurity of its leaves.

In [89]:
clf = DecisionTreeClassifier(random_state=1,class_weight = {0:0.15,1:0.85})
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
In [90]:
pd.DataFrame(path)
Out[90]:
ccp_alphas impurities
0 0.000000e+00 -1.307387e-15
1 4.530598e-20 -1.307342e-15
2 4.530598e-20 -1.307296e-15
3 4.530598e-20 -1.307251e-15
4 9.061196e-20 -1.307161e-15
... ... ...
842 4.488629e-03 4.113849e-01
843 8.348516e-03 4.197334e-01
844 8.668623e-03 4.284020e-01
845 1.409487e-02 4.424969e-01
846 5.735035e-02 4.998472e-01

847 rows × 2 columns

In [91]:
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker='o', drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()

Next, we train a decision tree using the effective alphas. The last value in ccp_alphas is the alpha value that prunes the whole tree, leaving the tree, clfs[-1], with one node.

In [92]:
clfs = []
for ccp_alpha in ccp_alphas:
    clf = DecisionTreeClassifier(random_state=1, ccp_alpha=ccp_alpha,class_weight = {0:0.15,1:0.85})
    clf.fit(X_train, y_train)
    clfs.append(clf)
print("Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
      clfs[-1].tree_.node_count, ccp_alphas[-1]))
Number of nodes in the last tree is: 1 with ccp_alpha: 0.05735035417016343

For the remainder, we remove the last element in clfs and ccp_alphas, because it is the trivial tree with only one node. Here we show that the number of nodes and tree depth decreases as alpha increases.

In [93]:
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]

node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1,figsize=(10,7))
ax[0].plot(ccp_alphas, node_counts, marker='o', drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker='o', drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
In [94]:
recall_train=[]
for clf in clfs:
    pred_train3=clf.predict(X_train)
    values_train=metrics.recall_score(y_train,pred_train3)
    recall_train.append(values_train)
In [95]:
recall_test=[]
for clf in clfs:
    pred_test3=clf.predict(X_test)
    values_test=metrics.recall_score(y_test,pred_test3)
    recall_test.append(values_test)
In [96]:
train_scores = [clf.score(X_train, y_train) for clf in clfs]
test_scores = [clf.score(X_test, y_test) for clf in clfs]
In [97]:
fig, ax = plt.subplots(figsize=(15,5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(ccp_alphas, recall_train, marker='o', label="train",
        drawstyle="steps-post",)
ax.plot(ccp_alphas, recall_test, marker='o', label="test",
        drawstyle="steps-post")
ax.legend()
plt.show()

Maximum value of Recall is at 0.014 alpha, but if we choose decision tree will only have a root node and we would lose the buisness rules, instead we can choose alpha 0.002 retaining information and getting higher recall.

In [98]:
# creating the model where we get highest train and test recall
index_best_model = np.argmax(recall_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=0.014094867070919892,
                       class_weight={0: 0.15, 1: 0.85}, random_state=1)
In [99]:
best_model.fit(X_train, y_train)
Out[99]:
DecisionTreeClassifier(ccp_alpha=0.014094867070919892,
                       class_weight={0: 0.15, 1: 0.85}, random_state=1)
In [100]:
make_confusion_matrix(best_model,y_test)
In [101]:
get_recall_score(best_model)
Recall on training set :  0.9340909090909091
Recall on test set :  0.9302721088435374

Visualizing the Decision Tree

In [102]:
plt.figure(figsize=(5,5))

out = tree.plot_tree(best_model,feature_names=feature_names,filled=True,fontsize=9,node_ids=False,class_names=None)
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor('black')
        arrow.set_linewidth(1)
plt.show()
  • This model might be giving the highest recall but a buisness would not be able to use it to actually target the potential customers.

Creating model with 0.002 ccp_alpha

In [103]:
best_model2 = DecisionTreeClassifier(ccp_alpha=0.002,
                       class_weight={0: 0.15, 1: 0.85}, random_state=1)
best_model2.fit(X_train, y_train)
Out[103]:
DecisionTreeClassifier(ccp_alpha=0.002, class_weight={0: 0.15, 1: 0.85},
                       random_state=1)
In [104]:
make_confusion_matrix(best_model2,y_test)
  • We are able to identify more True positives - customers that are actually going to contribute to revenue than any other model.
In [105]:
get_recall_score(best_model2)
Recall on training set :  0.8909090909090909
Recall on test set :  0.8673469387755102
  • The results have improved from the initial model and we have got higher recall than the hyperparameter tuned model and generalized decision tree - having comparable performance on training and test set.

Visualizing the Decision Tree

In [106]:
plt.figure(figsize=(15,10))

out = tree.plot_tree(best_model2,feature_names=feature_names,filled=True,fontsize=9,node_ids=False,class_names=None)
for o in out:
    arrow = o.arrow_patch
    if arrow is not None:
        arrow.set_edgecolor('black')
        arrow.set_linewidth(1)
plt.show()
In [107]:
# Text report showing the rules of a decision tree -

print(tree.export_text(best_model2,feature_names=feature_names,show_weights=True))
|--- ExitRates <= 0.04
|   |--- ProductRelated_Duration <= 315.08
|   |   |--- Month_May <= 0.50
|   |   |   |--- Month_Mar <= 0.50
|   |   |   |   |--- Month_Dec <= 0.50
|   |   |   |   |   |--- weights: [69.00, 83.30] class: 1
|   |   |   |   |--- Month_Dec >  0.50
|   |   |   |   |   |--- weights: [26.55, 9.35] class: 0
|   |   |   |--- Month_Mar >  0.50
|   |   |   |   |--- weights: [32.40, 4.25] class: 0
|   |   |--- Month_May >  0.50
|   |   |   |--- weights: [44.70, 6.80] class: 0
|   |--- ProductRelated_Duration >  315.08
|   |   |--- Month_Nov <= 0.50
|   |   |   |--- BounceRates <= 0.00
|   |   |   |   |--- weights: [191.70, 368.90] class: 1
|   |   |   |--- BounceRates >  0.00
|   |   |   |   |--- Administrative <= 0.50
|   |   |   |   |   |--- weights: [63.00, 28.05] class: 0
|   |   |   |   |--- Administrative >  0.50
|   |   |   |   |   |--- weights: [157.80, 160.65] class: 1
|   |   |--- Month_Nov >  0.50
|   |   |   |--- ExitRates <= 0.02
|   |   |   |   |--- weights: [63.90, 250.75] class: 1
|   |   |   |--- ExitRates >  0.02
|   |   |   |   |--- weights: [70.05, 136.00] class: 1
|--- ExitRates >  0.04
|   |--- Administrative <= 0.50
|   |   |--- weights: [289.05, 34.00] class: 0
|   |--- Administrative >  0.50
|   |   |--- weights: [75.30, 39.95] class: 0

In [108]:
# importance of features in the tree building ( The importance of a feature is computed as the 
#(normalized) total reduction of the 'criterion' brought by that feature. It is also known as the Gini importance )

print (pd.DataFrame(best_model2.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
                                    Imp
ExitRates                      0.555544
ProductRelated_Duration        0.131677
Month_Nov                      0.080984
BounceRates                    0.077994
Administrative                 0.065073
Month_Mar                      0.036807
Month_May                      0.031719
Month_Dec                      0.020203
SpecialDay_0.4                 0.000000
OperatingSystems_2             0.000000
Browser_5                      0.000000
Browser_6                      0.000000
Browser_7                      0.000000
Browser_8                      0.000000
Browser_9                      0.000000
Browser_10                     0.000000
Browser_11                     0.000000
Browser_12                     0.000000
Browser_13                     0.000000
OperatingSystems_4             0.000000
OperatingSystems_3             0.000000
SpecialDay_0.6                 0.000000
OperatingSystems_5             0.000000
Browser_3                      0.000000
OperatingSystems_6             0.000000
OperatingSystems_7             0.000000
OperatingSystems_8             0.000000
SpecialDay_0.2                 0.000000
SpecialDay_0.8                 0.000000
Browser_4                      0.000000
Region_7                       0.000000
Browser_2                      0.000000
Region_9                       0.000000
Informational                  0.000000
Informational_Duration         0.000000
ProductRelated                 0.000000
TrafficType                    0.000000
Month_Feb                      0.000000
Month_Jul                      0.000000
Month_June                     0.000000
Month_Oct                      0.000000
Month_Sep                      0.000000
VisitorType_Other              0.000000
VisitorType_Returning_Visitor  0.000000
Weekend_True                   0.000000
Region_2                       0.000000
Region_3                       0.000000
Region_4                       0.000000
Region_5                       0.000000
Region_6                       0.000000
Administrative_Duration        0.000000
Region_8                       0.000000
SpecialDay_1.0                 0.000000
In [109]:
importances = best_model2.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
  • Exit rates and ProductRelated duration are the top two important features to predict customer sessions contributing to Revenue.

Comparing all the decision tree models

In [110]:
comparison_frame = pd.DataFrame({'Model':['Initial decision tree model','Decision treee with hyperparameter tuning',
                                          'Decision tree with post-pruning'], 'Train_Recall':[1,0.83,0.89], 'Test_Recall':[0.30,0.81,0.86]}) 
comparison_frame
Out[110]:
Model Train_Recall Test_Recall
0 Initial decision tree model 1.00 0.30
1 Decision treee with hyperparameter tuning 0.83 0.81
2 Decision tree with post-pruning 0.89 0.86

Decision tree model with post pruning has given the best recall score on data.

Conclusions

  • We analyzed the "Online Shoppers Purchasing Intention" using different techniques and used Decision Tree Classifier to build a predictive model for the same.
  • The model built can be used to predict if a customer is going to contribute to Revenue generation (by purchasing) or not.
  • We visualized different trees and their confusion matrix to get a better understanding of the model. Easy interpretation is one of the key benefits of Decision Trees.
  • We verified the fact that how much less data preparation is needed for Decision Trees and such a simple model gave good results even with outliers and imbalanced classes which shows the robustness of Decision Trees.
  • ExitRates, ProductRelated_Duration, Month_Nov and BounceRates are the most important variable in predicting the customers that will contribute to the revenue.
  • We established the importance of hyper-parameters/ pruning to reduce overfitting.

Recommendations

  • According to the decision tree model -

    a) If a customer lands on a page with an exit rate greater than 0.041 there's a very high chance the customer will not be contributing to the revenue.

    b) If a customer lands on a page with an exit rate less than 0.041 and spends more than ~5 minutes on a product related page then there is a very high chance that the customer is going to buy something and contribute to the revenue.

  • It is observed that the more time the customers spend on Administrative, Informational and ProductRelated pages more is the likelihood of them contributing to the revenue. Although the website can not control the time spent by customers it can enhance the user experience on the website to engage them more on the website.

  • Browsing Customers - Employ the predictive model to predict potential customers (customers who can buy the product), Offer limited-time coupons/discounts on a real-time basis only to those customers. This can also be employed for the customers in months like March, May, November, and December, as in those months, the traffic is higher so these months have potential buying users.

  • It is observed that most of the traffic that the website sees is on the non-special days, While there is little to none traffic and revenue sessions on special days - the website should initiate schemes/offers on the special days to attract more customers on such days.

  • May and November were the months where the website saw the highest traffic with further data it should be investigated what portfolios were running in those months and an inspiration to create more such portfolios can be drawn and implemented.

  • Customer retention - Member Loyalty programs initiatives like special discounts, coupons, etc can be provided.

  • Better resource management - Regular days (Non-weekend) days is when the website sees the most traffic, resources such as customer care services can be allocated more for these days.

  • Website should be made more friendly, easily accessible, and feasible for other operating systems and browsers as new visitors seem to struggle with administrative pages.

  • Site Engagement - Version of the site for slower-internet users, Consistent and user-friendly mobile website design for more browsers and operating systems.